2000 character limit reached
Unstable markup: A template-based information extraction from web sites with unstable markup (1408.1260v1)
Published 6 Aug 2014 in cs.IR and cs.DL
Abstract: This paper presents results of a work on crawling CEUR Workshop proceedings web site to a Linked Open Data (LOD) dataset in the framework of ESWC 2014 Semantic Publishing Challenge 2014. Our approach is based on using an extensible template-dependent crawler and DBpedia for linking extracted entities, such as the names of universities and countries.