Abstract:Malware continues to evolve rapidly, and more than 450,000 new samples are captured every day, which makes manual malware analysis impractical. However, existing deep learning detection models need manual feature engineering or require high computational overhead for long training processes, which might be laborious to select feature space and difficult to retrain for mitigating model aging. Therefore, a crucial requirement for a detector is to realize automatic and efficient detection. In this paper, we propose a lightweight malware detection model called SeqNet which could be trained at high speed with low memory required on the raw binaries. By avoiding contextual confusion and reducing semantic loss, SeqNet maintains the detection accuracy when reducing the number of parameters to only 136K. We demonstrate the effectiveness of our methods and the low training cost requirement of SeqNet in our experiments. Besides, we make our datasets and codes public to stimulate further academic research.
Abstract:With the development of artificial intelligence algorithms like deep learning models and the successful applications in many different fields, further similar trails of deep learning technology have been made in cyber security area. It shows the preferable performance not only in academic security research but also in industry practices when dealing with part of cyber security issues by deep learning methods compared to those conventional rules. Especially for the malware detection and classification tasks, it saves generous time cost and promotes the accuracy for a total pipeline of malware detection system. In this paper, we construct special deep neural network, ie, MalDeepNet (TB-Malnet and IB-Malnet) for malware dynamic behavior classification tasks. Then we build the family clustering algorithm based on deep learning and fulfil related testing. Except that, we also design a novel malware prediction model which could detect the malware coming in future through the Mal Generative Adversarial Network (Mal-GAN) implementation. All those algorithms present fairly considerable value in related datasets afterwards.