Publication Type:

Conference Paper


2017 ISEA Asia Security and Privacy (ISEASP) (2017)


antimalware defensive solutions, computational modeling, Computers, Feature extraction, Feature selection, heuristic analysis, IG-FST, Information Gain, information gain score computation, information gain-feature selection technique, Internet, invasive software, Machine learning, malware, Malware detection, multiprocessing model, multiprocessing programs, N-Gram extraction technique, N-grams, Pattern matching, Testing, Training


Currently, the Internet faces serious threat from malwares, and its propagation may cause great havoc on computers and network security solutions. Several existing anti-malware defensive solutions detect known malware accurately. However, they fail to recognize unseen malware, since most of them rely on signature-based techniques, which are easily evadable using obfuscation or polymorphism technique. Therefore, there is immediate requirement of new techniques that can detect and classify the new malwares. In this context, heuristic analysis is found to be promising, since it is capable of detecting unknown malwares and new variants of current malwares. The N-Gram extraction technique is one such heuristic method commonly used in malware detection. Previous works have witnessed that shorter length N-Grams are easier to extract. In order to identify and remove noisy N-Grams, a popular Feature Selection Technique (FST), namely, Information Gain (IG), which computes score for each N-Gram (feature) in the dataset has been used in this work. N-Grams with the highest IG score are considered as best features, while the remaining N-Grams are neglected. The IG-FST (Information Gain-Feature Selection Technique) is computational resource demanding and takes time to generate IG scores for larger N-Gram datasets, if the processing is to be accomplished in the sequential mode. To address this issue, the present work presents a multiprocessing model that computes IG scores rapidly for larger N-Gram datasets. The proposed model has been designed, implemented, and compared with the sequential mode of IG score computation. The experimental results demonstrate that the proposed multiprocessing model performance is 80% faster than the sequential model of IG score computation.

Cite this Research Publication

S. L. S. Darshan, Ajay Kumara, and Jaidhar, C. D., “Information gain score computation for N-grams using multiprocessing model”, in 2017 ISEA Asia Security and Privacy (ISEASP), 2017.