2000 character limit reached
Comparison of Quality Indicators in User-generated Content Using Social Media and Scholarly Text (1910.11399v1)
Published 24 Oct 2019 in cs.CL
Abstract: Predicting the quality of a text document is a critical task when presented with the problem of measuring the performance of a document before its release. In this work, we evaluate various features including those extracted from the text content (textual) and those describing higher-level characteristics of the text (meta) features that are not directly available from the text, and show how these features inform prediction of document quality in different ways. Moreover, we also compare our methods on both social user-generated data such as tweets, and scholarly user-generated data such as academic articles, showing how the same features differently influence prediction of quality across these disparate domains.