NeuroinformaticsNatural Language Processing (NeuroNLP) relies on clustering and classification for information categorization of biologically relevant extraction targets and for interconnections to knowledge-related patterns in event and text mined datasets. The accuracy of machine learning algorithms depended on quality of text-mined data while efficacy relied on the context of the choice of techniques. Although developments of automated keyword extraction methods have made differences in the quality of data selection, the efficacy of the Natural Language Processing (NLP) methods using verified keywords remain a challenge. In this paper, we studied the role of text classification and document clustering algorithms on datasets, where features were obtained by mapping to manually verified MESH terms published by National Library of Medicine (NLM). In this study, NLP data classification involved comparing 8techniques and unsupervised learning was performed with 6 clustering algorithms. Most classification techniques except meta-based algorithms namely stacking and vote, allowed 90% or higher training accuracy. Test accuracy was high (=>95%) probably due to limited test dataset. Logistic Model Trees had 30-fold higher runtime compared to other classification algorithms including Naive Bayes, AdaBoost, Hoeffding Tree. Grouped error rate in clustering was 0-4%. Runtime-wise, clustering was faster than classification algorithms on MESH-mapped NLP data suggesting clustering methods as adequate towards Medline-related datasets and text-mining big data analytic systems. © 2015 IEEE.
Nidheesh Melethadathil, Priya Chellaiah, Dr. Bipin G. Nair, and Dr. Shyam Diwakar, “Classification and Clustering for Neuroinformatics: Assessing the efficacy on reverse-mapped NeuroNLP data using standard ML techniques”, in Proceedings of the Fourth International Conference on Advances in Computing, Communications and Informatics (ICACCI-2015), Kochi, India, 2015.