As the amount of data generated on a day to day basis is on the uphill the urgency for efficient frameworks to handle, store and process the same is also increasing. Frameworks like Hadoop have proven its strength to churn huge volume of data to bring out the hidden patterns supporting decision making. Project that is being assigned to us is to develop Mapreduce based Machine Learning Algorithms to run on Hadoop clusters. Algorithms will be assigned on a case to case basis. Algorithms so developed will be integrated with Amrita BigData Framework (ABDF). Among those algorithms an application area or an end to end comparison will be done against different processing modes like linear implementation. Key matrix such as execution speed, usage of resources, accuracy, etc will be measured as applicable to the algorithm. Amrita Bigdata Framework (ABDF) is essentially an all integrated framework for effortless BigData analytics. ABDF is feature rich analytics framework, providing user community with an easy to use GUI for analyzing large data heaps. ABDF is capable of switching its processing modes between, Hadoop, Spark streaming/in-memory, Storm in-memory and Linear execution. Implementing Machine Learning algorithm in a distributed environment is trickier than its sequential implementation. While writing a mapreduce job we need to identify what part of the algorithm can be parallelized and how to parallelize.
U. B. Unmesha Sreeveni and Shiju Sathyadevan, “ABDF Integratable Machine Learning Algorithms-MapReduce Implementation”, in Procedia Computer Science, 2015, vol. 58, pp. 297 - 306.