Automatic Detection of Webpages that Share the Same Web Template
Abstract: Template extraction is the process of isolating the template of a given webpage. It is widely used in several disciplines, including webpages development, content extraction, block detection, and webpages indexing. One of the main goals of template extraction is identifying a set of webpages with the same template without having to load and analyze too many webpages prior to identifying the template. This work introduces a new technique to automatically discover a reduced set of webpages in a website that implement the template. This set is computed with an hyperlink analysis that computes a very small set with a high level of confidence.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.