Architecture of A Scalable Dynamic Parallel WebCrawler with High Speed Downloadable Capability for a Web Search Engine (1102.0676v1)
Abstract: Today World Wide Web (WWW) has become a huge ocean of information and it is growing in size everyday. Downloading even a fraction of this mammoth data is like sailing through a huge ocean and it is a challenging task indeed. In order to download a large portion of data from WWW, it has become absolutely essential to make the crawling process parallel. In this paper we offer the architecture of a dynamic parallel Web crawler, christened as "WEB-SAILOR," which presents a scalable approach based on Client-Server model to speed up the download process on behalf of a Web Search Engine in a distributed Domain-set specific environment. WEB-SAILOR removes the possibility of overlapping of downloaded documents by multiple crawlers without even incurring the cost of communication overhead among several parallel "client" crawling processes.
- Debajyoti Mukhopadhyay (52 papers)
- Sajal Mukherjee (27 papers)
- Soumya Ghosh (39 papers)
- Saheli Kar (1 paper)
- Young-Chon Kim (7 papers)