Over the last few years, data size grew tremendously in size and thus data analytics is always geared towards low latency processing. Processing of Big Data using traditional methodologies is not cost effective and fast enough to meet the requirements. Existing socket based communication (TCP/IP) used in Hadoop causes performance bottleneck on the significant amount of data transfers through a multi-gigabit network fabric. To fulfill the emerging demands , the underlying design should be modified to make use of data centre’s powerful hardware. The proposed project include integration of Hadoop with remote direct memory access (RDMA).For data-intensive applications, network performance becomes key component as the amount of data being stored and replicated to HDFS increases. RDMA is implemented in a commodity hardware through software ,namely, Soft-iWARP (Software-Internet Wide Area Protocol). Hadoop employs a Java-based network transport stack on top of the JVM . JVM introduces a significant amount of overhead to data processing capability of the native interfaces which constrains use of RDMA. The usage of plug-in library for data shuffling and merging part of Hadoop can take advantage of RDMA . An optimization for Hadoop in data shuffling part can be thus implemented.
V. Vejesh, G. Nayar, R., and Shiju Sathyadevan, “Optimization of Hadoop Using Software-Internet Wide Area Remote Direct Memory Access Protocol and Unstructured Data Accelerator”, in Software Engineering in Intelligent Systems: Proceedings of the 4th Computer Science On-line Conference 2015 (CSOC2015), Vol 3: Software Engineering in Intelligent Systems, Cham, 2015, pp. 261–270.