Publication Type:

Journal Article

Source:

International Journal of Computer Technology and Applications (IJCTA), Volume 5, Issue 5 (2014)

URL:

http://www.ijcta.com/documents/volumes/vol5issue5/ijcta2014050517.pdf

Abstract:

Analysing web log files has become an important task for E-Commerce companies to predict their customer behaviour and to improve their business. Each click in
an E-commerce web page creates 100 bytes of data. Large E-Commerce websites like flipkart.com, amazon.in and ebay.in are visited millions of customers simultaneously. As a result, these customers generate petabytes of data in their web log files. As the web log file size is huge we require parallel processing and reliable data storage system
for processing the web log files. Both the requirements are provided by Hadoop framework. Hadoop provides Hadoop Distributed File System (HDFS) and MapReduce rogramming model for processing huge dataset efficiently and effectively. In this paper, NASA web log file is analysed and the total number of hits received by each web page in a website, the total number of hits received by a web site in each hour using Hadoop framework is calculated and it is shown that Hadoop framework takes less response time to produce accurate results. Keywords - Hadoop, MapReduce, Log Files, Parallel Processing, Hadoop Distributed File System, ECommerce

Cite this Research Publication

S. Saravanan and B. Uma Maheswari, “Analyzing Large Web Log Files in a Hadoop Distributed Cluster Environment”, International Journal of Computer Technology and Applications (IJCTA), vol. 5, no. 5, 2014.

207
PROGRAMS
OFFERED
5
AMRITA
CAMPUSES
15
CONSTITUENT
SCHOOLS
A
GRADE BY
NAAC, MHRD
9th
RANK(INDIA):
NIRF 2017
150+
INTERNATIONAL
PARTNERS