Training Neural network models for clustering is computationally expensive in distributed environment. This paper presents a Self Organizing Map (SOM) Model suitable for clustering based on data parallelism. Though the proposed approach is a generic MapReduce prototype for any neural network model, this paper extends the prototype to leverage into SOM model. Our technique differs from the existing technique in training and agglomeration of the results. MapReduce using Apache Spark is more efficient than Apache Hadoop, due to Resilient Distributed Datasets (RDDs). Experimentation on real data sets on Spark platform shows the feasibility of the proposed approach. Further, comparison with Distributed KMeans based on the metrics like Purity and Scalability shows efficacy of the proposed solution. © 2018, Institute of Advanced Scientific Research, Inc. All rights reserved.
A. Mukundan and Sandhya Harikumar, “A mapreduce model for distributed self organizing map using apache spark”, Journal of Advanced Research in Dynamical and Control Systems, vol. 10, pp. 1229-1238, 2018.