Releasing the CRaQAn (Coreference Resolution in Question-Answering): An open-source dataset and dataset creation methodology using instruction-following models (2311.16338v1)
Abstract: Instruction-following LLMs demand robust methodologies for information retrieval to augment instructions for question-answering applications. A primary challenge is the resolution of coreferences in the context of chunking strategies for long documents. The critical barrier to experimentation of handling coreferences is a lack of open source datasets, specifically in question-answering tasks that require coreference resolution. In this work we present our Coreference Resolution in Question-Answering (CRaQAn) dataset, an open-source dataset that caters to the nuanced information retrieval requirements of coreference resolution in question-answering tasks by providing over 250 question-answer pairs containing coreferences. To develop this dataset, we developed a novel approach for creating high-quality datasets using an instruction-following model (GPT-4) and a Recursive Criticism and Improvement Loop.
- Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172, 2023.
- Pinecone Roie Schwaber-Cohen. Chunking strategies for llm applications. https://www.pinecone.io/learn/chunking-strategies/, 2023. Accessed: 2023-09-25.
- Improving event coreference resolution using document-level and topic-level information. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6765–6775, 2022.
- Are large language models robust zero-shot coreference resolvers? arXiv preprint arXiv:2305.14489, 2023.
- Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533, 2022.
- OpenAI. Gpt-4 technical report, 2023.
- Language models can solve computer tasks. arXiv preprint arXiv:2303.17491, 2023.
- Ontonotes: A large training corpus for enhanced processing. Joseph Olive, Caitlin Christianson, andJohn McCary, editors, Handbook of Natural LanguageProcessing and Machine Translation: DARPA GlobalAutonomous Language Exploitation, 2011.
- Preco: A large-scale dataset in preschool vocabulary for coreference resolution. arXiv preprint arXiv:1810.09807, 2018.
- Mind the gap: A balanced corpus of gendered ambiguous pronouns. Transactions of the Association for Computational Linguistics, 6:605–617, 2018.
- Natural questions: a benchmark for question answering research. Transactions of the Association of Computational Linguistics, 2019.
- Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.
- Know what you don’t know: Unanswerable questions for squad. arXiv preprint arXiv:1806.03822, 2018.
- Quoref: A reading comprehension dataset with questions requiring coreferential reasoning. arXiv preprint arXiv:1908.05803, 2019.
- Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600, 2018.
- Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications. Progress in Artificial Intelligence, 12(1):1–32, 2023.
- How useful are educational questions generated by large language models? In International Conference on Artificial Intelligence in Education, pages 536–542. Springer, 2023.
- Question generation by transformers. arXiv preprint arXiv:1909.05017, 2019.
- Simplifying paragraph-level question generation via transformer language models. In PRICAI 2021: Trends in Artificial Intelligence: 18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021, Hanoi, Vietnam, November 8–12, 2021, Proceedings, Part II 18, pages 323–334. Springer, 2021.
- A recurrent bert-based model for question generation. In Proceedings of the 2nd workshop on machine reading for question answering, pages 154–162, 2019.
- Generating faithful synthetic data with large language models: A case study in computational social science. arXiv preprint arXiv:2305.15041, 2023.
- Anthropic. Prompt engineering for claude’s long context window. https://www.anthropic.com/index/prompting-long-context, 2023. Accessed 29-09-2023.
- Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–7, 2021.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
- Critic: Large language models can self-correct with tool-interactive critiquing. arXiv preprint arXiv:2305.11738, 2023.
- Rob Grzywinski (1 paper)
- Joshua D'Arcy (2 papers)
- Rob Naidoff (1 paper)
- Ashish Shukla (30 papers)
- Alex Browne (1 paper)
- Ren Gibbons (1 paper)
- Brinnae Bent (5 papers)