Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Releasing the CRaQAn (Coreference Resolution in Question-Answering): An open-source dataset and dataset creation methodology using instruction-following models (2311.16338v1)

Published 27 Nov 2023 in cs.CL and cs.AI

Abstract: Instruction-following LLMs demand robust methodologies for information retrieval to augment instructions for question-answering applications. A primary challenge is the resolution of coreferences in the context of chunking strategies for long documents. The critical barrier to experimentation of handling coreferences is a lack of open source datasets, specifically in question-answering tasks that require coreference resolution. In this work we present our Coreference Resolution in Question-Answering (CRaQAn) dataset, an open-source dataset that caters to the nuanced information retrieval requirements of coreference resolution in question-answering tasks by providing over 250 question-answer pairs containing coreferences. To develop this dataset, we developed a novel approach for creating high-quality datasets using an instruction-following model (GPT-4) and a Recursive Criticism and Improvement Loop.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172, 2023.
  2. Pinecone Roie Schwaber-Cohen. Chunking strategies for llm applications. https://www.pinecone.io/learn/chunking-strategies/, 2023. Accessed: 2023-09-25.
  3. Improving event coreference resolution using document-level and topic-level information. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6765–6775, 2022.
  4. Are large language models robust zero-shot coreference resolvers? arXiv preprint arXiv:2305.14489, 2023.
  5. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533, 2022.
  6. OpenAI. Gpt-4 technical report, 2023.
  7. Language models can solve computer tasks. arXiv preprint arXiv:2303.17491, 2023.
  8. Ontonotes: A large training corpus for enhanced processing. Joseph Olive, Caitlin Christianson, andJohn McCary, editors, Handbook of Natural LanguageProcessing and Machine Translation: DARPA GlobalAutonomous Language Exploitation, 2011.
  9. Preco: A large-scale dataset in preschool vocabulary for coreference resolution. arXiv preprint arXiv:1810.09807, 2018.
  10. Mind the gap: A balanced corpus of gendered ambiguous pronouns. Transactions of the Association for Computational Linguistics, 6:605–617, 2018.
  11. Natural questions: a benchmark for question answering research. Transactions of the Association of Computational Linguistics, 2019.
  12. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.
  13. Know what you don’t know: Unanswerable questions for squad. arXiv preprint arXiv:1806.03822, 2018.
  14. Quoref: A reading comprehension dataset with questions requiring coreferential reasoning. arXiv preprint arXiv:1908.05803, 2019.
  15. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600, 2018.
  16. Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications. Progress in Artificial Intelligence, 12(1):1–32, 2023.
  17. How useful are educational questions generated by large language models? In International Conference on Artificial Intelligence in Education, pages 536–542. Springer, 2023.
  18. Question generation by transformers. arXiv preprint arXiv:1909.05017, 2019.
  19. Simplifying paragraph-level question generation via transformer language models. In PRICAI 2021: Trends in Artificial Intelligence: 18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021, Hanoi, Vietnam, November 8–12, 2021, Proceedings, Part II 18, pages 323–334. Springer, 2021.
  20. A recurrent bert-based model for question generation. In Proceedings of the 2nd workshop on machine reading for question answering, pages 154–162, 2019.
  21. Generating faithful synthetic data with large language models: A case study in computational social science. arXiv preprint arXiv:2305.15041, 2023.
  22. Anthropic. Prompt engineering for claude’s long context window. https://www.anthropic.com/index/prompting-long-context, 2023. Accessed 29-09-2023.
  23. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–7, 2021.
  24. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  25. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
  26. Critic: Large language models can self-correct with tool-interactive critiquing. arXiv preprint arXiv:2305.11738, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Rob Grzywinski (1 paper)
  2. Joshua D'Arcy (2 papers)
  3. Rob Naidoff (1 paper)
  4. Ashish Shukla (30 papers)
  5. Alex Browne (1 paper)
  6. Ren Gibbons (1 paper)
  7. Brinnae Bent (5 papers)

Summary

We haven't generated a summary for this paper yet.