Papers
Topics
Authors
Recent
2000 character limit reached

The Links Have It: Infobox Generation by Summarization over Linked Entities

Published 25 Jun 2014 in cs.IR | (1406.6449v1)

Abstract: Online encyclopedia such as Wikipedia has become one of the best sources of knowledge. Much effort has been devoted to expanding and enriching the structured data by automatic information extraction from unstructured text in Wikipedia. Although remarkable progresses have been made, their effectiveness and efficiency is still limited as they try to tackle an extremely difficult natural language understanding problems and heavily relies on supervised learning approaches which require large amount effort to label the training data. In this paper, instead of performing information extraction over unstructured natural language text directly, we focus on a rich set of semi-structured data in Wikipedia articles: linked entities. The idea of this paper is the following: If we can summarize the relationship between the entity and its linked entities, we immediately harvest some of the most important information about the entity. To this end, we propose a novel rank aggregation approach to remove noise, an effective clustering and labeling algorithm to extract knowledge.

Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.