Cluster computing is an approach for storing and processing huge amount of data that is being generated. Hadoop and Spark are the two cluster computing platforms which are prominent today. Hadoop incorporates the MapReduce concept and is scalable as well as fault-tolerant. But the limitations of Hadoop paved way for another cluster computing framework named Spark. It is faster and can also manage multiple workloads due to its inmemory processing. In this paper, we discuss the underlying concepts of Hadoop and mention the limitations that led to the development of Spark. Further we give a detailed description about Spark framework and its advantages. We demonstrate a wordcount problem in both Hadoop and Spark and do a comparative study.
A. N., Vijay Krishna Menon, and Dr. (Col.) Kumar P. N., “Cluster Computing Paradigms – A Comparative study of Evolving Frameworks”, IJCTA, (International Conference Soft Computing Systems, ICSCS-2016), vol. 8, pp. 1911-1916, 2016.