Web Crawler Architecture over Cloud Computing compared with Grid Computing

Document Type : Original Research Articles.

Authors

1 CS Dept. Faculty of Computers and information, Mansoura, Beni-Suef University

2 CS Dept. Faculty of Computers and information, Mansoura University

3 CS Dept. Faculty of Computers and information, Beni-Suef University

Abstract

Web Crawler is considered as the core module of web search engines. It should be designed to cover high percent of Internet and adapt on scaling and in a distributed architecture. The crawler architecture has an effect on the quantity of fetched web pages in a determined time. Cloud computing is a type of computing paradigm that is characterized by a set of powerful points such as excitability, scalability, dynamism, and resource provisioning on demand, where these features are adding value in the crawler architecture. In this article, we propose an architecture for the web crawler that is designed over the cloud computing. The web crawler needs highly intensive computation, storage, and bandwidth. These resources can be provisioned by the cloud computing on demand with superior flexibility in changing as in the proposed architecture. We implemented and experimented the proposed architecture over cloud computing and evaluated the results of running. We also proposed another architecture based on grid computing to compare the results of the experiments over cloud computing with results over grid computing to evaluate the cloud-based architecture. Cloud computing has a higher performance than the grid computing. The proposed crawler over cloud computing exploited the features of cloud computing such as scalability, reliability, and flexibility through a well-defined service based architecture. Moreover, the results highlighted the enhancement in performance of the cloud-based architecture against the grid-based and monolithic.

Keywords

Main Subjects