Abnormality or inconsistency detection within a data is an attempt to make a distinction between usual and exceptional data instances. In this paper, we have proposed a novel methodAbnormality Prediction in High Dimensional Dataset among Semi Supervised Learning approaches (AP-HDD-SSL) to match the efficiencies of different semi supervised machine learning approaches using high dimensional KDD CUP 99 dataset. The pre-processing phase with dimensionality diminution is done prior to clustering using RFE (Random Forest Ensemble). Clustering with k-Means is initiated after the pre-processing phase for storing the most anomalous cluster. The classification within the cluster is done with semi-supervised learning approaches: k-Nearest Neighbour (k-NN), Linear Discriminant Analysis (LDA), Support Vector Machine-RFE(SVM-RFE), that are analysed and compared with the existing Over Sampling-PCA(os-PCA) method. The comparison results with Pima Indian and KDD cup 99 in terms of Accuracy, Detection Rate and AUC scores summarizes that AP-HDD-SSL with SVM-RFE outranked the other approaches.
A. Manghat and Asha Ashok, “Abnormality prediction in high dimensional dataset among semi supervised learning approaches”, in 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India, 2017.