NewsQs: Multi-Source Question Generation for the Inquiring Mind (2402.18479v2)

Published 28 Feb 2024 in cs.CL

Abstract: We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents. To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles from the News On the Web corpus. We show that fine-tuning a model with control codes produces questions that are judged acceptable more often than the same model without them as measured through human evaluation. We use a QNLI model with high correlation with human annotations to filter our data. We release our final dataset of high-quality questions, answers, and document clusters as a resource for future work in query-based multi-document summarization.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (24)

Authors (8)

Alyssa Hwang (10 papers)
Kalpit Dixit (3 papers)
Miguel Ballesteros (70 papers)
Yassine Benajiba (21 papers)
Vittorio Castelli (24 papers)
Markus Dreyer (14 papers)
Mohit Bansal (304 papers)
Kathleen McKeown (85 papers)

Tweets

https://twitter.com/gm8xx8/status/1763111200013898101

NewsQs: Multi-Source Question Generation for the Inquiring Mind (2402.18479v2)

Related Papers

Tweets