Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

WikiSQE: A Large-Scale Dataset for Sentence Quality Estimation in Wikipedia (2305.05928v2)

Published 10 May 2023 in cs.CL

Abstract: Wikipedia can be edited by anyone and thus contains various quality sentences. Therefore, Wikipedia includes some poor-quality edits, which are often marked up by other editors. While editors' reviews enhance the credibility of Wikipedia, it is hard to check all edited text. Assisting in this process is very important, but a large and comprehensive dataset for studying it does not currently exist. Here, we propose WikiSQE, the first large-scale dataset for sentence quality estimation in Wikipedia. Each sentence is extracted from the entire revision history of English Wikipedia, and the target quality labels were carefully investigated and selected. WikiSQE has about 3.4 M sentences with 153 quality labels. In the experiment with automatic classification using competitive machine learning models, sentences that had problems with citation, syntax/semantics, or propositions were found to be more difficult to detect. In addition, by performing human annotation, we found that the model we developed performed better than the crowdsourced workers. WikiSQE is expected to be a valuable resource for other tasks in NLP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Automatically Labeling Low Quality Content on Wikipedia By Leveraging Patterns in Editing Behaviors. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2).
  2. ParaCrawl: Web-Scale Acquisition of Parallel Corpora. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4555–4567.
  3. Detection of Puffery on the English Wikipedia. In Proceedings of the Seventh Workshop on Noisy User-generated Text, 329–333.
  4. Britannica, E. 2006. Fatally Flawed: Refuting the Recent Study on Encyclopedic Accuracy by the Journal Nature. Chicago, Estados Unidos: Encyclopaedia Britannica.
  5. Fine-grained Controversy Detection in Wikipedia. In 2015 IEEE 31st International Conference on Data Engineering, 1573–1584.
  6. Chesney, T. 2006. An empirical examination of Wikipedia’s credibility. First Monday, 11(11).
  7. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 4171–4186.
  8. Editorial. 2006. Britannica Attacks. Nature, 440: 582.
  9. Finding Hedges by Chasing Weasels: Hedge Detection Using Wikipedia Tags and Shallow Linguistic Features. In Proceedings of the 2009 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 173–176. Suntec, Singapore.
  10. Giles, J. 2005. Internet Encyclopaedias Go Head to Head. Nature, 438: 900–901.
  11. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. arXiv.
  12. An Annotation Scheme for Automated Bias Detection in Wikipedia. In Proceedings of the 5th Linguistic Annotation Workshop, 47–55.
  13. Neural Based Statement Classification for Biased Language. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, 195–203.
  14. WikiHow: A Large Scale Text Summarization Dataset. arXiv preprint.
  15. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv.
  16. WikiHist. html: English Wikipedia’s Full Revision History in HTML Format. In Proceedings of the International AAAI Conference on Web and Social Media, volume 14, 878–884.
  17. Mola-Velasco, S. M. 2011. Wikipedia Vandalism Detection. In Proceedings of the 28th International Conference on World Wide Web Companion, 391–396.
  18. StereoSet: Measuring Stereotypical Bias in Pretrained Language Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 5356–5371.
  19. The CoNLL-2014 Shared Task on Grammatical Error Correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, 1–14.
  20. OpenAI. 2023. GPT-4 Technical Report. arXiv.
  21. Language Models are Unsupervised Multitask Learners. OpenAI blog.
  22. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2383–2392.
  23. Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia’s Verifiability. In Proceedings of the 28th International Conference on World Wide Web Companion, 1567–1578.
  24. PySBD: Pragmatic Sentence Boundary Disambiguation. In Proceedings of Second Workshop for NLP Open Source Software, 110–114. Online.
  25. FEVER: a Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 809–819.
  26. FEVER: a Large-scale Dataset for Fact Extraction and VERification. In Walker, M.; Ji, H.; and Stent, A., eds., Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 809–819.
  27. Neural Network Acceptability Judgments. Transactions of the Association for Computational Linguistics, 7: 625–641.
  28. Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2437–2442.
Citations (1)

Summary

We haven't generated a summary for this paper yet.