Releasing the CRaQAn (Coreference Resolution in Question-Answering): An open-source dataset and dataset creation methodology using instruction-following models (2311.16338v1)

Published 27 Nov 2023 in cs.CL and cs.AI

Abstract: Instruction-following LLMs demand robust methodologies for information retrieval to augment instructions for question-answering applications. A primary challenge is the resolution of coreferences in the context of chunking strategies for long documents. The critical barrier to experimentation of handling coreferences is a lack of open source datasets, specifically in question-answering tasks that require coreference resolution. In this work we present our Coreference Resolution in Question-Answering (CRaQAn) dataset, an open-source dataset that caters to the nuanced information retrieval requirements of coreference resolution in question-answering tasks by providing over 250 question-answer pairs containing coreferences. To develop this dataset, we developed a novel approach for creating high-quality datasets using an instruction-following model (GPT-4) and a Recursive Criticism and Improvement Loop.

References (26)

Authors (7)

Rob Grzywinski (1 paper)
Joshua D'Arcy (2 papers)
Rob Naidoff (1 paper)
Ashish Shukla (30 papers)
Alex Browne (1 paper)
Ren Gibbons (1 paper)
Brinnae Bent (5 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Releasing the CRaQAn (Coreference Resolution in Question-Answering): An open-source dataset and dataset creation methodology using instruction-following models (2311.16338v1)

Summary

Related Papers