Categorizing Malware via A Word2Vec-based Temporal Convolutional Network Scheme

Bibliographic Details
Title: Categorizing Malware via A Word2Vec-based Temporal Convolutional Network Scheme
Authors: Jiankun Sun, Xiong Luo, Honghao Gao, Weiping Wang, Yang Gao, Xi Yang
Source: Journal of Cloud Computing: Advances, Systems and Applications, Vol 9, Iss 1, Pp 1-14 (2020)
Publisher Information: SpringerOpen, 2020.
Publication Year: 2020
Collection: LCC:Computer engineering. Computer hardware
LCC:Electronic computers. Computer science
Subject Terms: Temporal Convolutional Network (TCN), Word2Vec, Internet of Things (IoT), Malware Categorization, Edge Computing, Computer engineering. Computer hardware, TK7885-7895, Electronic computers. Computer science, QA75.5-76.95
More Details: Abstract As edge computing paradigm achieves great popularity in recent years, there remain some technical challenges that must be addressed to guarantee smart device security in Internet of Things (IoT) environment. Generally, smart devices transmit individual data across the IoT for various purposes nowadays, and it will cause losses and impose a huge threat to users since malware may steal and damage these data. To improve malware detection performance on IoT smart devices, we conduct a malware categorization analysis based on the Kaggle competition of Microsoft Malware Classification Challenge (BIG 2015) dataset in this article. Practically speaking, motivated by temporal convolutional network (TCN) structure, we propose a malware categorization scheme mainly using Word2Vec pre-trained model. Considering that the popular one-hot encoding converts input names from malicious files to high-dimensional vectors since each name is represented as one dimension in one-hot vector space, more compact vectors with fewer dimensions are obtained through the use of Word2Vec pre-training strategy, and then it can lead to fewer parameters and stronger malware feature representation. Moreover, compared with long short-term memory (LSTM), TCN demonstrates better performance with longer effective memory and faster training speed in sequence modeling tasks. The experimental comparisons on this malware dataset reveal better categorization performance with less memory usage and training time. Especially, through the performance comparison between our scheme and the state-of-the-art Word2Vec-based LSTM approach, our scheme shows approximately 1.3% higher predicted accuracy than the latter on this malware categorization task. Additionally, it also demonstrates that our scheme reduces about 90 thousand parameters and more than 1 hour on the model training time in this comparison.
Document Type: article
File Description: electronic resource
Language: English
ISSN: 2192-113X
Relation: http://link.springer.com/article/10.1186/s13677-020-00200-y; https://doaj.org/toc/2192-113X
DOI: 10.1186/s13677-020-00200-y
Access URL: https://doaj.org/article/f1b60cb692b646bf82df43eea0539fec
Accession Number: edsdoj.f1b60cb692b646bf82df43eea0539fec
Database: Directory of Open Access Journals
More Details
ISSN:2192113X
DOI:10.1186/s13677-020-00200-y
Published in:Journal of Cloud Computing: Advances, Systems and Applications
Language:English