Back close

Evaluating deep learning approaches to characterize and classify the DGAs at scale

Publication Type : Conference Paper

Publisher : , Journal of Intelligent and Fuzzy Systems

Source : Journal of Intelligent and Fuzzy Systems, Vol 34, pp 1265-1276, 2018

Url :

Keywords : Analog computers, Brain, Classification (of information), Command and control systems, Computer crime, Convolution, Convolution neural network, Convolutional neural network, Deep learning, Feature extraction, Generation algorithm, identity-recurrent neural network (IRNN), Image recognition, Learning algorithms, Learning mechanism, Long short-term memory, malware, Memory architecture, Natural language processing systems, Network architecture, Recurrent neural network (RNN), Semantics, speech processing, Speech recognition

Campus : Coimbatore

School : School of Engineering

Center : Computational Engineering and Networking

Department : Center for Computational Engineering and Networking (CEN), Electronics and Communication

Year : 2018

Abstract : In recent years, domain generation algorithms (DGAs) are the foundational mechanisms for many malware families. Mainly, due to the fact that DGA can generate immense number of pseudo random domain names to associate to a command and control (C2) infrastructures. This paper focuses on to detect and classify the pseudo random domain names without relying on the feature engineering or any other linguistic, contextual or semantics and statistical information by adopting deep learning approaches. A deep learning approach is a complex model of traditional machine learning mechanism that has received renewed interest by solving the long-standing tasks in artificial intelligence (AI) related to the field of natural language processing, image recognition, speech processing and many others. They have immense capability to extract optimal feature representations by taking input as in the form of raw input texts. To leverage this and to transfer the performance enhancement in aforementioned areas towards characterize, detect and classify the DGA generated domain names to a specific malware family, this paper adopts deep learning mechanisms with a known one million benign domain names from Alexa, OpenDNS and a corpus of malicious domain names generated from 17 DGA malware families in real time for training in character and bigram level and a trained model has been evaluated on the OSNIT data set in real-time. Specifically, to understand the effectiveness of various deep learning mechanisms, we used recurrent neural network (RNN), identity-recurrent neural network (I-RNN), long short-term memory (LSTM), convolution neural network (CNN), and convolutional neural network-long short-term memory (CNN-LSTM) architectures. Additionally, to find out an optimal architecture, experiments are done with various configurations of network parameters and network structures. All experiments run up to 1000 epochs with a learning rate set in the range [0.01-0.5]. Overall, deep learning approaches, particularly family of recurrent neural network and a hybrid network (where the first layer is CNN and a subsequent layer is LSTM) have showed significant performance with a highest detection rate 0.9945 and 0.9879 respectively. The main reason is deep learning approaches have inherent mechanisms to capture hierarchical feature extraction and long range-dependencies in sequence inputs. © 2018 - IOS Press and the authors. All rights reserved.

Cite this Research Publication : Vinaykumar R, K P Soman, Prabaharan P, Sachin Kumar S, Evaluating Deep Learning Approaches to Characterize and Classify the DGA at Scale, Journal of Intelligent and Fuzzy Systems, Vol 34, pp 1265-1276, 2018

Admissions Apply Now