Analysing web log files has become an important task for E-Commerce companies to predict their customer behaviour and to improve their business. Each click in
an E-commerce web page creates 100 bytes of data. Large E-Commerce websites like flipkart.com, amazon.in and ebay.in are visited millions of customers simultaneously. As a result, these customers generate petabytes of data in their web log files. As the web log file size is huge we require parallel processing and reliable data storage system
for processing the web log files. Both the requirements are provided by Hadoop framework. Hadoop provides Hadoop Distributed File System (HDFS) and MapReduce rogramming model for processing huge dataset efficiently and effectively. In this paper, NASA web log file is analysed and the total number of hits received by each web page in a website, the total number of hits received by a web site in each hour using Hadoop framework is calculated and it is shown that Hadoop framework takes less response time to produce accurate results. Keywords - Hadoop, MapReduce, Log Files, Parallel Processing, Hadoop Distributed File System, ECommerce
S. Saravanan and B. Uma Maheswari, “Analyzing Large Web Log Files in a Hadoop Distributed Cluster Environment”, International Journal of Computer Technology and Applications (IJCTA), vol. 5, no. 5, 2014.