Abstract:Backdoor defenses have recently become important in resisting backdoor attacks in deep neural networks (DNNs), where attackers implant backdoors into the DNN model by injecting backdoor samples into the training dataset. Although there are many defense methods to achieve backdoor detection for DNN inputs and backdoor elimination for DNN models, they still have not presented a clear explanation of the relationship between these two missions. In this paper, we use the features from the middle layer of the DNN model to analyze the difference between backdoor and benign samples and propose Backdoor Consistency, which indicates that at least one backdoor exists in the DNN model if the backdoor trigger is detected exactly on input. By analyzing the middle features, we design an effective and comprehensive backdoor defense method named BeniFul, which consists of two parts: a gray-box backdoor input detection and a white-box backdoor elimination. Specifically, we use the reconstruction distance from the Variational Auto-Encoder and model inference results to implement backdoor input detection and a feature distance loss to achieve backdoor elimination. Experimental results on CIFAR-10 and Tiny ImageNet against five state-of-the-art attacks demonstrate that our BeniFul exhibits a great defense capability in backdoor input detection and backdoor elimination.
Abstract:Artificial intelligence and machine learning have been integrated into all aspects of our lives and the privacy of personal data has attracted more and more attention. Since the generation of the model needs to extract the effective information of the training data, the model has the risk of leaking the privacy of the training data. Membership inference attacks can measure the model leakage of source data to a certain degree. In this paper, we design a privacy-preserving generative framework against membership inference attacks, through the information extraction and data generation capabilities of the generative model variational autoencoder (VAE) to generate synthetic data that meets the needs of differential privacy. Instead of adding noise to the model output or tampering with the training process of the target model, we directly process the original data. We first map the source data to the latent space through the VAE model to get the latent code, then perform noise process satisfying metric privacy on the latent code, and finally use the VAE model to reconstruct the synthetic data. Our experimental evaluation demonstrates that the machine learning model trained with newly generated synthetic data can effectively resist membership inference attacks and still maintain high utility.