Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zihao Tan

TARGET: Template-Transferable Backdoor Attack Against Prompt-based NLP Models via GPT4

Nov 29, 2023

Zihao Tan, Qingliang Chen, Yongjian Huang, Chen Liang

Figure 1 for TARGET: Template-Transferable Backdoor Attack Against Prompt-based NLP Models via GPT4

Figure 2 for TARGET: Template-Transferable Backdoor Attack Against Prompt-based NLP Models via GPT4

Figure 3 for TARGET: Template-Transferable Backdoor Attack Against Prompt-based NLP Models via GPT4

Figure 4 for TARGET: Template-Transferable Backdoor Attack Against Prompt-based NLP Models via GPT4

Abstract:Prompt-based learning has been widely applied in many low-resource NLP tasks such as few-shot scenarios. However, this paradigm has been shown to be vulnerable to backdoor attacks. Most of the existing attack methods focus on inserting manually predefined templates as triggers in the pre-training phase to train the victim model and utilize the same triggers in the downstream task to perform inference, which tends to ignore the transferability and stealthiness of the templates. In this work, we propose a novel approach of TARGET (Template-trAnsfeRable backdoor attack aGainst prompt-basEd NLP models via GPT4), which is a data-independent attack method. Specifically, we first utilize GPT4 to reformulate manual templates to generate tone-strong and normal templates, and the former are injected into the model as a backdoor trigger in the pre-training phase. Then, we not only directly employ the above templates in the downstream task, but also use GPT4 to generate templates with similar tone to the above templates to carry out transferable attacks. Finally we have conducted extensive experiments on five NLP datasets and three BERT series models, with experimental results justifying that our TARGET method has better attack performance and stealthiness compared to the two-external baseline methods on direct attacks, and in addition achieves satisfactory attack capability in the unseen tone-similar templates.

Via

Access Paper or Ask Questions

COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models

Jun 14, 2023

Zihao Tan, Qingliang Chen, Wenbin Zhu, Yongjian Huang

Figure 1 for COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models

Figure 2 for COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models

Figure 3 for COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models

Figure 4 for COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models

Abstract:Prompt-based learning has been proved to be an effective way in pre-trained language models (PLMs), especially in low-resource scenarios like few-shot settings. However, the trustworthiness of PLMs is of paramount significance and potential vulnerabilities have been shown in prompt-based templates that could mislead the predictions of language models, causing serious security concerns. In this paper, we will shed light on some vulnerabilities of PLMs, by proposing a prompt-based adversarial attack on manual templates in black box scenarios. First of all, we design character-level and word-level heuristic approaches to break manual templates separately. Then we present a greedy algorithm for the attack based on the above heuristic destructive approaches. Finally, we evaluate our approach with the classification tasks on three variants of BERT series models and eight datasets. And comprehensive experimental results justify the effectiveness of our approach in terms of attack success rate and attack speed. Further experimental studies indicate that our proposed method also displays good capabilities in scenarios with varying shot counts, template lengths and query counts, exhibiting good generalizability.

Via

Access Paper or Ask Questions