Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Architecture of A Scalable Dynamic Parallel WebCrawler with High Speed Downloadable Capability for a Web Search Engine (1102.0676v1)

Published 3 Feb 2011 in cs.IR

Abstract: Today World Wide Web (WWW) has become a huge ocean of information and it is growing in size everyday. Downloading even a fraction of this mammoth data is like sailing through a huge ocean and it is a challenging task indeed. In order to download a large portion of data from WWW, it has become absolutely essential to make the crawling process parallel. In this paper we offer the architecture of a dynamic parallel Web crawler, christened as "WEB-SAILOR," which presents a scalable approach based on Client-Server model to speed up the download process on behalf of a Web Search Engine in a distributed Domain-set specific environment. WEB-SAILOR removes the possibility of overlapping of downloaded documents by multiple crawlers without even incurring the cost of communication overhead among several parallel "client" crawling processes.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Debajyoti Mukhopadhyay (52 papers)
  2. Sajal Mukherjee (27 papers)
  3. Soumya Ghosh (39 papers)
  4. Saheli Kar (1 paper)
  5. Young-Chon Kim (7 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.