Papers
Topics
Authors
Recent
Search
2000 character limit reached

Evaluating Web Content Quality via Multi-scale Features

Published 23 Apr 2013 in cs.IR | (1304.6181v1)

Abstract: Web content quality measurement is crucial to various web content processing applications. This paper will explore multi-scale features which may affect the quality of a host, and develop automatic statistical methods to evaluate the Web content quality. The extracted properties include statistical content features, page and host level link features and TFIDF features. The experiments on ECML/PKDD 2010 Discovery Challenge data set show that the algorithm is effective and feasible for the quality tasks of multiple languages, and the multi-scale features have different identification ability and provide good complement to each other for most tasks.

Citations (22)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.