https://github.com/PXX1110/Meta_AT.
Adversarial training (AT) is a prominent technique employed by deep learning models to defend against adversarial attacks, and to some extent, enhance model robustness. However, there are three main drawbacks of the existing AT-based defense methods: expensive computational cost, low generalization ability, and the dilemma between the original model and the defense model. To this end, we propose a novel benchmark called meta adversarial defense (MAD). The MAD benchmark consists of two MAD datasets, along with a MAD evaluation protocol. The two large-scale MAD datasets were generated through experiments using 30 kinds of attacks on MNIST and CIFAR-10 datasets. In addition, we introduce a meta-learning based adversarial training (Meta-AT) algorithm as the baseline, which features high robustness to unseen adversarial attacks through few-shot learning. Experimental results demonstrate the effectiveness of our Meta-AT algorithm compared to the state-of-the-art methods. Furthermore, the model after Meta-AT maintains a relatively high clean-samples classification accuracy (CCA). It is worth noting that Meta-AT addresses all three aforementioned limitations, leading to substantial improvements. This benchmark ultimately achieved breakthroughs in investigating the transferability of adversarial defense methods to new attacks and the ability to learn from a limited number of adversarial examples. Our codes and attacked datasets address will be available at