Anomaly detection refers to the task of estimating and finding patterns which do not comply with the general behavior of data. Moreover, a range of assumptions are made so as to differentiate between normal and deviated data instances. This paper describes a solution approach to this problem using a two-step phase including an important preprocessing phase and anomaly detection phase. For the preprocessing phase, we have used two methods mainly: Recursive Feature Elimination method (RFE) and Random Forest Ensemble (RF-Ensemble) method. For the next phase of anomaly detection, we have used Clustering based Oversampling PCA (os-PCA) methodology. The k-median clustering approach is utilized for this purpose. The technique was implemented and tested on various standard data sets like Pima, Splice etc. The results were also compared with the existing state of the methods in this field like online Oversampling PCA, Naive Oversampling PCA, decremental PCA, Local Outlier Factor, Angle Based Outlier detection and Median Based Outlier Detection approaches. The testing results confirm that the proposed approach outperformed all other methods on the basis of accuracy, AUC scores etc.
Asha Ashok, Smitha, S., and Krishna, M. H. K., “Attribute reduction based anomaly detection scheme by clustering dependent oversampling PCA”, in Symposium on Emerging Topics in Computing and Communications (SETCAC’16), International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2016.