Abstract:Controlling the model to generate texts of different categories is a challenging task that is getting more and more attention. Recently, generative adversarial net (GAN) has shown promising results in category text generation. However, the texts generated by GANs usually suffer from the problems of mode collapse and training instability. To avoid the above problems, we propose a novel model named category-aware variational recurrent neural network (CatVRNN), which is inspired by multi-task learning. In our model, generation and classification are trained simultaneously, aiming at generating texts of different categories. Moreover, the use of multi-task learning can improve the quality of generated texts, when the classification task is appropriate. And we propose a function to initialize the hidden state of CatVRNN to force model to generate texts of a specific category. Experimental results on three datasets demonstrate that our model can do better than several state-of-the-art text generation methods based GAN in the category accuracy and quality of generated texts.
Abstract:Security vulnerabilities play a vital role in network security system. Fuzzing technology is widely used as a vulnerability discovery technology to reduce damage in advance. However, traditional fuzzing techniques have many challenges, such as how to mutate input seed files, how to increase code coverage, and how to effectively bypass verification. Machine learning technology has been introduced as a new method into fuzzing test to alleviate these challenges. This paper reviews the research progress of using machine learning technology for fuzzing test in recent years, analyzes how machine learning improve the fuzz process and results, and sheds light on future work in fuzzing. Firstly, this paper discusses the reasons why machine learning techniques can be used for fuzzing scenarios and identifies six different stages in which machine learning have been used. Then this paper systematically study the machine learning based fuzzing models from selection of machine learning algorithm, pre-processing methods, datasets, evaluation metrics, and hyperparameters setting. Next, this paper assesses the performance of the machine learning models based on the frequently used evaluation metrics. The results of the evaluation prove that machine learning technology has an acceptable capability of categorize predictive for fuzzing. Finally, the comparison on capability of discovering vulnerabilities between traditional fuzzing tools and machine learning based fuzzing tools is analyzed. The results depict that the introduction of machine learning technology can improve the performance of fuzzing. However, there are still some limitations, such as unbalanced training samples and difficult to extract the characteristics related to vulnerabilities.
Abstract:In this paper, we invest the domain transfer learning problem with multi-instance data. We assume we already have a well-trained multi-instance dictionary and its corresponding classifier from the source domain, which can be used to represent and classify the bags. But it cannot be directly used to the target domain. Thus we propose to adapt them to the target domain by adding an adaptive term to the source domain classifier. The adaptive function is a linear function based a domain transfer multi-instance dictionary. Given a target domain bag, we first map it to a bag-level feature space using the domain transfer dictionary, and then apply a the linear adaptive function to its bag-level feature vector. To learn the domain-transfer dictionary and the adaptive function parameter, we simultaneously minimize the average classification error of the target domain classifier over the target domain training set, and the complexities of both the adaptive function parameter and the domain transfer dictionary. The minimization problem is solved by an iterative algorithm which update the dictionary and the function parameter alternately. Experiments over several benchmark data sets show the advantage of the proposed method over existing state-of-the-art domain transfer multi-instance learning methods.