Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change (2309.10945v1)
Abstract: Pir\'a is a reading comprehension dataset focused on the ocean, the Brazilian coast, and climate change, built from a collection of scientific abstracts and reports on these topics. This dataset represents a versatile language resource, particularly useful for testing the ability of current machine learning models to acquire expert scientific knowledge. Despite its potential, a detailed set of baselines has not yet been developed for Pir\'a. By creating these baselines, researchers can more easily utilize Pir\'a as a resource for testing machine learning models across a wide range of question answering tasks. In this paper, we define six benchmarks over the Pir\'a dataset, covering closed generative question answering, machine reading comprehension, information retrieval, open question answering, answer triggering, and multiple choice question answering. As part of this effort, we have also produced a curated version of the original dataset, where we fixed a number of grammar issues, repetitions, and other shortcomings. Furthermore, the dataset has been extended in several new directions, so as to face the aforementioned benchmarks: translation of supporting texts from English into Portuguese, classification labels for answerability, automatic paraphrases of questions and answers, and multiple choice candidates. The results described in this paper provide several points of reference for researchers interested in exploring the challenges provided by the Pir\'a dataset.
- Nations, U.: World Ocean Assessment I. United Nations publication, New York, NY, USA (2017). https://www.un.org/regularprocess/content/first-world-ocean-assessment (3) Nations, U.: World Ocean Assessment II. United Nations publication, New York, NY, USA (2021). https://www.un.org/regularprocess/woa2launch (4) Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21, 140–114067 (2020) (5) Carmo, D., Piau, M., Campiotti, I., Nogueira, R., de Alencar Lotufo, R.: PTT5: Pretraining and Validating the T5 Model on Brazilian Portuguese Data. CoRR abs/2008.09144 (2020) 2008.09144 (6) Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Nations, U.: World Ocean Assessment II. United Nations publication, New York, NY, USA (2021). https://www.un.org/regularprocess/woa2launch (4) Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21, 140–114067 (2020) (5) Carmo, D., Piau, M., Campiotti, I., Nogueira, R., de Alencar Lotufo, R.: PTT5: Pretraining and Validating the T5 Model on Brazilian Portuguese Data. CoRR abs/2008.09144 (2020) 2008.09144 (6) Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21, 140–114067 (2020) (5) Carmo, D., Piau, M., Campiotti, I., Nogueira, R., de Alencar Lotufo, R.: PTT5: Pretraining and Validating the T5 Model on Brazilian Portuguese Data. CoRR abs/2008.09144 (2020) 2008.09144 (6) Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Carmo, D., Piau, M., Campiotti, I., Nogueira, R., de Alencar Lotufo, R.: PTT5: Pretraining and Validating the T5 Model on Brazilian Portuguese Data. CoRR abs/2008.09144 (2020) 2008.09144 (6) Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Nations, U.: World Ocean Assessment II. United Nations publication, New York, NY, USA (2021). https://www.un.org/regularprocess/woa2launch (4) Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21, 140–114067 (2020) (5) Carmo, D., Piau, M., Campiotti, I., Nogueira, R., de Alencar Lotufo, R.: PTT5: Pretraining and Validating the T5 Model on Brazilian Portuguese Data. CoRR abs/2008.09144 (2020) 2008.09144 (6) Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21, 140–114067 (2020) (5) Carmo, D., Piau, M., Campiotti, I., Nogueira, R., de Alencar Lotufo, R.: PTT5: Pretraining and Validating the T5 Model on Brazilian Portuguese Data. CoRR abs/2008.09144 (2020) 2008.09144 (6) Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Carmo, D., Piau, M., Campiotti, I., Nogueira, R., de Alencar Lotufo, R.: PTT5: Pretraining and Validating the T5 Model on Brazilian Portuguese Data. CoRR abs/2008.09144 (2020) 2008.09144 (6) Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21, 140–114067 (2020) (5) Carmo, D., Piau, M., Campiotti, I., Nogueira, R., de Alencar Lotufo, R.: PTT5: Pretraining and Validating the T5 Model on Brazilian Portuguese Data. CoRR abs/2008.09144 (2020) 2008.09144 (6) Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Carmo, D., Piau, M., Campiotti, I., Nogueira, R., de Alencar Lotufo, R.: PTT5: Pretraining and Validating the T5 Model on Brazilian Portuguese Data. CoRR abs/2008.09144 (2020) 2008.09144 (6) Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Carmo, D., Piau, M., Campiotti, I., Nogueira, R., de Alencar Lotufo, R.: PTT5: Pretraining and Validating the T5 Model on Brazilian Portuguese Data. CoRR abs/2008.09144 (2020) 2008.09144 (6) Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, June 6-11, 2021, pp. 483–498. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.naacl-main.41. https://doi.org/10.18653/v1/2021.naacl-main.41 (7) Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC Corpus: A New Open Resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018). https://aclanthology.org/L18-1686 (8) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020) 2005.14165 (9) OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- OpenAI: Gpt-4 technical report (2023) arXiv:2303.08774 (10) Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang, Y.: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (2023) (11) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 1-4, 2016, pp. 2383–2392. The Association for Computational Linguistics, Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1264 (12) Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A.P., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M., Dai, A.M., Uszkoreit, J., Le, Q., Petrov, S.: Natural Questions: a Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguistics 7, 452–466 (2019). https://doi.org/10.1162/tacl_a_00276 (13) Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Rajpurkar, P., Jia, R., Liang, P.: Know What You Don’t Know: Unanswerable Questions for SQuAD. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, July 15-20, 2018, Volume 2: Short Papers, pp. 784–789. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2124 (14) Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Reddy, S., Chen, D., Manning, C.D.: CoQA: A Conversational Question Answering Challenge. CoRR abs/1808.07042 (2018) 1808.07042 (15) Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Sen, P., Saffari, A.: What do Models Learn from Question Answering Datasets? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2429–2438. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.190 (16) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423 (17) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019) 1907.11692 (18) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In: 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=rJ4km2R5t7 (19) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale Reading Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (20) Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Souza, F., Nogueira, R., de Alencar Lotufo, R.: BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In: Cerri, R., Prati, R.C. (eds.) Intelligent Systems - 9th Brazilian Conference, BRACIS 2020, October 20-23, 2020, Proceedings, Part I. Lecture Notes in Computer Science, vol. 12319, pp. 403–417. Springer, Rio Grande, Brazil (2020). https://doi.org/10.1007/978-3-030-61377-8_28 (21) Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Robertson, S.E., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019 (22) Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Karpukhin, V., Oguz, B., Min, S., Lewis, P.S.H., Wu, L., Edunov, S., Chen, D., Yih, W.: Dense Passage Retrieval for Open-Domain Question Answering. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, November 16-20, 2020, pp. 6769–6781. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.550 (23) Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Cação, F.N., José, M.M., Oliveira, A.S., Spindola, S., Costa, A.H.R., Cozman, F.G.: DEEPAGÉ: Answering Questions in Portuguese About the Brazilian Environment. In: Britto, A., Delgado, K.V. (eds.) Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, November 29 - December 3, 2021, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13074, pp. 419–433. Springer, Virtual Event (2021). https://doi.org/10.1007/978-3-030-91699-2_29 (24) Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., Gabriel, I.: Ethical and Social Risks of Harm from Language Models. CoRR abs/2112.04359 (2021) 2112.04359 (25) Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Thoppilan, R., Freitas, D.D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H.S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., Chen, D., Xu, Y., Chen, Z., Roberts, A., Bosma, M., Zhou, Y., Chang, C., Krivokon, I., Rusch, W., Pickett, M., Meier-Hellstern, K.S., Morris, M.R., Doshi, T., Santos, R.D., Duke, T., Soraker, J., Zevenbergen, B., Prabhakaran, V., Diaz, M., Hutchinson, B., Olson, K., Molina, A., Hoffman-John, E., Lee, J., Aroyo, L., Rajakumar, R., Butryna, A., Lamm, M., Kuzmina, V., Fenton, J., Cohen, A., Bernstein, R., Kurzweil, R., Aguera-Arcas, B., Cui, C., Croak, M., Chi, E.H., Le, Q.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 2201.08239 (26) Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Yang, Y., Yih, W., Meek, C.: WikiQA: A Challenge Dataset for Open-Domain Question Answering. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, September 17-21, 2015, pp. 2013–2018. The Association for Computational Linguistics, Lisbon, Portugal (2015). https://doi.org/10.18653/v1/d15-1237 (27) Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Jurczyk, T., Zhai, M., Choi, J.D.: SelQA: A New Benchmark for Selection-Based Question Answering. In: 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016, November 6-8, 2016, pp. 820–827. IEEE Computer Society, San Jose, CA, USA (2016). https://doi.org/10.1109/ICTAI.2016.0128 (28) Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Wang, B., Yao, T., Zhang, Q., Xu, J., Wang, X.: ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 9146–9153. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6450 (29) Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Rogers, A., Kovaleva, O., Downey, M., Rumshisky, A.: Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, February 7-12, 2020, pp. 8722–8731. AAAI Press, New York, NY, USA (2020). https://ojs.aaai.org/index.php/AAAI/article/view/6398 (30) Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Pirozelli, P., Brandão, A.A.F., Peres, S.M., Cozman, F.G.: To Answer or Not to Answer? Filtering Questions for QA Systems. In: Junior, J.C.X., Rios, R.A. (eds.) Intelligent Systems - 11th Brazilian Conference, BRACIS 2022, November 28 - December 1, 2022, Proceedings, Part II. Lecture Notes in Computer Science, vol. 13654, pp. 464–478. Springer, Campinas, Brazil (2022). https://doi.org/10.1007/978-3-031-21689-3_33 (31) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020. Proceedings of Machine Learning Research, vol. 119, pp. 11328–11339. PMLR, Virtual Event (2020). http://proceedings.mlr.press/v119/zhang20ae.html (32) Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Pellicer, L.F.A.O., Pirozelli, P., Costa, A.H.R., Inoue, A.: PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 299–309. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_28 (33) José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- José, M.M., José, M.A., Mauá, D.D., Cozman, F.G.: Integrating Question Answering and Text-to-SQL in Portuguese. In: Pinheiro, V., Gamallo, P., Amaro, R., Scarton, C., Batista, F., Silva, D.F., Magro, C., Pinto, H. (eds.) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, March 21-23, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13208, pp. 278–287. Springer, Fortaleza, Brazil (2022). https://doi.org/10.1007/978-3-030-98305-5_26 (34) Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Clark, P., Etzioni, O., Khot, T., Sabharwal, A., Tafjord, O., Turney, P., Khashabi, D.: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (35) Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., Tafjord, O.: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457 (2018) 1803.05457 (36) Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.H.: RACE: Large-scale ReAding Comprehension Dataset From Examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017, pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/d17-1082 (37) Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In: Flores, G., Chen, G.H., Pollard, T.J., Ho, J.C., Naumann, T. (eds.) Conference on Health, Inference, and Learning, CHIL 2022, 7-8 April 2022. Proceedings of Machine Learning Research, vol. 174, pp. 248–260. PMLR, Virtual Event (2022). https://proceedings.mlr.press/v174/pal22a.html (38) Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Richardson, M., Burges, C.J.C., Renshaw, E.: MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 193–203. ACL, Grand Hyatt Seattle, Seattle, Washington, USA (2013) (39) Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Ostermann, S., Roth, M., Modi, A., Thater, S., Pinkal, M.: SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In: Apidianaki, M., Mohammad, S.M., May, J., Shutova, E., Bethard, S., Carpuat, M. (eds.) Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, June 5-6, 2018, pp. 747–757. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018). https://doi.org/10.18653/v1/s18-1119 (40) Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Coniam, D.: A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests. CALICO Journal 14 (1997). https://doi.org/10.1558/cj.v14i2-4.15-33 (41) CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- CH, D.R., Saha, S.K.: Automatic Multiple Choice Question Generation From Text: A Survey. IEEE Transactions on Learning Technologies 13(1), 14–25 (2020). https://doi.org/10.1109/TLT.2018.2889100 (42) Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Yagcioglu, S., Erdem, A., Erdem, E., Ikizler-Cinbis, N.: RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, October 31 - November 4, 2018, pp. 1358–1368. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/d18-1166 (43) Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Khashabi, D., Min, S., Khot, T., Sabharwal, A., Tafjord, O., Clark, P., Hajishirzi, H.: UnifiedQA: Crossing Format Boundaries With a Single QA System. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 1896–1907. Association for Computational Linguistics, Online Event (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.171 (44) Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44 Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
- Silveira, I.C., Mauá, D.D.: University Entrance Exam as a Guiding Test for Artificial Intelligence. In: 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, October 2-5, 2017, pp. 426–431. IEEE Computer Society, Uberlândia, Brazil (2017). https://doi.org/10.1109/BRACIS.2017.44
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.