Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 79 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 15 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 186 tok/s Pro
GPT OSS 120B 445 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

ELQA: A Corpus of Metalinguistic Questions and Answers about English (2205.00395v2)

Published 1 May 2022 in cs.CL

Abstract: We present ELQA, a corpus of questions and answers in and about the English language. Collected from two online forums, the >70k questions (from English learners and others) cover wide-ranging topics including grammar, meaning, fluency, and etymology. The answers include descriptions of general properties of English vocabulary and grammar as well as explanations about specific (correct and incorrect) usage examples. Unlike most NLP datasets, this corpus is metalinguistic -- it consists of language about language. As such, it can facilitate investigations of the metalinguistic capabilities of NLU models, as well as educational applications in the language learning domain. To study this, we define a free-form question answering task on our dataset and conduct evaluations on multiple LLMs to analyze their capacity to generate metalinguistic answers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. A survey on mining stack overflow: question and answering (Q&A) community. Data Technol. Appl., 52:190–247.
  2. On the types and frequency of meta-language in conversation: A preliminary report. In 14th Annual Meeting of the Society for Text and Discourse.
  3. Think you have solved direct-answer question answering? Try ARC-DA, the direct-answer AI2 reasoning challenge. arXiv preprint arXiv:2102.03315.
  4. Ksenija Bogetic. 2021. MetaLangCORP: Presenting the first corpus of media metalanguage in Slovene, Croatian and Serbian, and its cross-discipline applicability. Fluminensia, 33:123–142.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  6. The BEA-2019 shared task on grammatical error correction. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 52–75, Florence, Italy. Association for Computational Linguistics.
  7. Jill Burstein. 2003. The E-rater® scoring engine: Automated essay scoring with natural language processing. In Automated essay scoring: A cross-disciplinary perspective, pages 113–121. Lawrence Erlbaum Associates Publishers.
  8. DoQA - accessing domain-specific FAQs via conversational QA. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7302–7314, Online. Association for Computational Linguistics.
  9. Evaluation of text generation: A survey. ArXiv, abs/2006.14799.
  10. Maria Chinkina and Detmar Meurers. 2017. Question generation for language learning: From ensuring texts are read to supporting learning. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 334–344, Copenhagen, Denmark. Association for Computational Linguistics.
  11. HOO 2012: A report on the preposition and determiner error correction shared task. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 54–62, Montréal, Canada. Association for Computational Linguistics.
  12. Learning hybrid representations to retrieve semantically equivalent questions. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 694–699, Beijing, China. Association for Computational Linguistics.
  13. ELI5: Long form question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3558–3567, Florence, Italy. Association for Computational Linguistics.
  14. Neural automated essay scoring and coherence modeling for adversarially crafted input. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 263–271, New Orleans, Louisiana. Association for Computational Linguistics.
  15. CQADupStack: A benchmark data set for community question-answering research. In Proceedings of the 20th Australasian Document Computing Symposium (ADCS), ADCS ’15, pages 3:1–3:8, New York, NY, USA. ACM.
  16. How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering. Transactions of the Association for Computational Linguistics, 9:962–977.
  17. Bidimensional leaderboards: Generate and evaluate language hand in hand. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3540–3557, Seattle, United States. Association for Computational Linguistics.
  18. GooAQ: Open question answering with diverse answer types. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 421–433, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  19. Vaibhav Kumar and Alan W Black. 2020. ClarQ: A large-scale and diverse dataset for clarification question generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7296–7301, Online. Association for Computational Linguistics.
  20. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
  21. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  22. Learning to automatically generate fill-in-the-blank quizzes. In Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications, pages 152–156, Melbourne, Australia. Association for Computational Linguistics.
  23. SemEval-2017 Task 3: Community question answering. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 27–48, Vancouver, Canada. Association for Computational Linguistics.
  24. The CoNLL-2014 shared task on grammatical error correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pages 1–14, Baltimore, Maryland. Association for Computational Linguistics.
  25. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  26. Introducing MANtIS: a novel multi-domain information seeking dialogues dataset. arXiv preprint arXiv:1912.04639.
  27. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  28. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  29. Ehud Reiter. 2018. A structured review of the validity of BLEU. Computational Linguistics, 44(3):393–401.
  30. Scaling up models and data with t5x and seqio. arXiv preprint arXiv:2203.17189.
  31. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426, Online. Association for Computational Linguistics.
  32. QA dataset explosion: A taxonomy of NLP resources for question answering and reading comprehension. ACM Comput. Surv., 55(10).
  33. Discriminative approach to fill-in-the-blank quiz generation for language learners. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 238–242, Sofia, Bulgaria. Association for Computational Linguistics.
  34. Charlotte Taylor. 2015. Beyond sarcasm: The metalanguage and structures of mock politeness. Journal of Pragmatics, 87:127–141.
  35. Generating answer candidates for quizzes and answer-aware question generators. In Proceedings of the Student Research Workshop Associated with RANLP 2021, pages 203–209, Online. INCOMA Ltd.
  36. A comprehensive survey of grammatical error correction. ACM Trans. Intell. Syst. Technol., 12(5).
  37. Shomir Wilson. 2010. Distinguishing use and mention in natural language. In Proceedings of the NAACL HLT 2010 Student Research Workshop, pages 29–33, Los Angeles, CA. Association for Computational Linguistics.
  38. Shomir Wilson. 2011. In search of the use-mention distinction and its impact on language processing tasks. IJCLA, 2(1-2):139–154.
  39. Shomir Wilson. 2012. The creation of a corpus of English metalanguage. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 638–646, Jeju Island, Korea. Association for Computational Linguistics.
  40. Shomir Wilson. 2013. Toward automatic processing of English metalanguage. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 760–766, Nagoya, Japan. Asian Federation of Natural Language Processing.
  41. Shomir Wilson. 2017. A bridge from the use-mention distinction to natural language processing. In Paul Saka and Michael Johnson, editors, The Semantics and Pragmatics of Quotation, pages 79–96. Springer International Publishing, Cham.
  42. Want a good answer? Ask a good question first! arXiv preprint arXiv:1311.6876.
  43. BERTScore: Evaluating text generation with BERT. In International Conference on Learning Representations.
Citations (7)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube