2000 character limit reached
Learning To Split and Rephrase From Wikipedia Edit History (1808.09468v1)
Published 28 Aug 2018 in cs.CL
Abstract: Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning. We extract a rich new dataset for this task by mining Wikipedia's edit history: WikiSplit contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan et al. (2017) as a benchmark for this task. Incorporating WikiSplit as training data produces a model with qualitatively better predictions that score 32 BLEU points above the prior best result on the WebSplit benchmark.
- Jan A. Botha (10 papers)
- Manaal Faruqui (39 papers)
- John Alex (2 papers)
- Jason Baldridge (45 papers)
- Dipanjan Das (42 papers)