Publication Type:

Conference Paper

Source:

2017 International Conference on Computational Intelligence in Data Science(ICCIDS), IEEE, Chennai, India (2017)

URL:

http://ieeexplore.ieee.org/document/8272639/

Keywords:

aggregation algorithm, Algorithm design and analysis, Big data, Computational intelligence, data application, Data centers, data travel cost, distributed databases, Hadoop, Hadoop distributed file system, Hadoop distributed framework, HDFS, map function, MapReduce, network blockage, Optimization, Parallel processing, Parallel programming, Partitioning algorithms, programming methods, reducer phase, shuffling process, Task analysis, Telecommunication traffic, value pairs

Abstract:

MapReduce is one of the famous programming methods used by the developers and researchers for “big data”. MapReduce basically runs on Hadoop distributed framework and works efficiently to give better results for large data set. It uses two functional algorithms to process chunks of data. Map function collects the data from local Hadoop Distributed File System (HDFS) and further divides it into number of small chunks for parallel processing. Shuffling process sorts the intermediate results and sends the key and value pairs to the reducer phase. So, when the same key and value pairs are sent by the shuffler to the same reducer, a high volume of network blockage occurs which in turn impose a severe constraint on the processing of the data application. This paper proposes an aggregation algorithm to overcome such traffic in using MapReduce.

Cite this Research Publication

S. Pandey, Dr. Supriya M., and Shrivastava, A., “Aggregation algorithm to overcome data travel cost in MapReduce”, in 2017 International Conference on Computational Intelligence in Data Science(ICCIDS), Chennai, India, 2017.