Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xi Cao

Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script

Dec 17, 2024

Xi Cao, Yuan Sun, Jiajun Li, Quzong Gesang, Nuo Qun, Tashi Nyima

Figure 1 for Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script

Figure 2 for Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script

Figure 3 for Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script

Figure 4 for Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script

Abstract:DNN-based language models perform excellently on various tasks, but even SOTA LLMs are susceptible to textual adversarial attacks. Adversarial texts play crucial roles in multiple subfields of NLP. However, current research has the following issues. (1) Most textual adversarial attack methods target rich-resourced languages. How do we generate adversarial texts for less-studied languages? (2) Most textual adversarial attack methods are prone to generating invalid or ambiguous adversarial texts. How do we construct high-quality adversarial robustness benchmarks? (3) New language models may be immune to part of previously generated adversarial texts. How do we update adversarial robustness benchmarks? To address the above issues, we introduce HITL-GAT, a system based on a general approach to human-in-the-loop generation of adversarial texts. HITL-GAT contains four stages in one pipeline: victim model construction, adversarial example generation, high-quality benchmark construction, and adversarial robustness evaluation. Additionally, we utilize HITL-GAT to make a case study on Tibetan script which can be a reference for the adversarial research of other less-studied languages.

* Review Version; Submitted to NAACL 2025 Demo Track

Via

Access Paper or Ask Questions

Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model

Dec 03, 2024

Xi Cao, Nuo Qun, Quzong Gesang, Yulei Zhu, Trashi Nyima

Figure 1 for Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model

Figure 2 for Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model

Figure 3 for Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model

Figure 4 for Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model

Abstract:In social media, neural network models have been applied to hate speech detection, sentiment analysis, etc., but neural network models are susceptible to adversarial attacks. For instance, in a text classification task, the attacker elaborately introduces perturbations to the original texts that hardly alter the original semantics in order to trick the model into making different predictions. By studying textual adversarial attack methods, the robustness of language models can be evaluated and then improved. Currently, most of the research in this field focuses on English, and there is also a certain amount of research on Chinese. However, there is little research targeting Chinese minority languages. With the rapid development of artificial intelligence technology and the emergence of Chinese minority language models, textual adversarial attacks become a new challenge for the information processing of Chinese minority languages. In response to this situation, we propose a multi-granularity Tibetan textual adversarial attack method based on masked language models called TSTricker. We utilize the masked language models to generate candidate substitution syllables or words, adopt the scoring mechanism to determine the substitution order, and then conduct the attack method on several fine-tuned victim models. The experimental results show that TSTricker reduces the accuracy of the classification models by more than 28.70% and makes the classification models change the predictions of more than 90.60% of the samples, which has an evidently higher attack effect than the baseline method.

* Companion Proceedings of the ACM Web Conference 2024
* Revised Version; Accepted at WWW 2024 Workshop on SocialNLP

Via

Access Paper or Ask Questions

TSCheater: Generating High-Quality Tibetan Adversarial Texts via Visual Similarity

Dec 03, 2024

Xi Cao, Quzong Gesang, Yuan Sun, Nuo Qun, Tashi Nyima

Abstract:Language models based on deep neural networks are vulnerable to textual adversarial attacks. While rich-resource languages like English are receiving focused attention, Tibetan, a cross-border language, is gradually being studied due to its abundant ancient literature and critical language strategy. Currently, there are several Tibetan adversarial text generation methods, but they do not fully consider the textual features of Tibetan script and overestimate the quality of generated adversarial texts. To address this issue, we propose a novel Tibetan adversarial text generation method called TSCheater, which considers the characteristic of Tibetan encoding and the feature that visually similar syllables have similar semantics. This method can also be transferred to other abugidas, such as Devanagari script. We utilize a self-constructed Tibetan syllable visual similarity database called TSVSDB to generate substitution candidates and adopt a greedy algorithm-based scoring mechanism to determine substitution order. After that, we conduct the method on eight victim language models. Experimentally, TSCheater outperforms existing methods in attack effectiveness, perturbation magnitude, semantic similarity, visual similarity, and human acceptance. Finally, we construct the first Tibetan adversarial robustness evaluation benchmark called AdvTS, which is generated by existing methods and proofread by humans.

* Review Version; Submitted to ICASSP 2025

Via

Access Paper or Ask Questions

Pay Attention to the Robustness of Chinese Minority Language Models! Syllable-level Textual Adversarial Attack on Tibetan Script

Dec 03, 2024

Xi Cao, Dolma Dawa, Nuo Qun, Trashi Nyima

Abstract:The textual adversarial attack refers to an attack method in which the attacker adds imperceptible perturbations to the original texts by elaborate design so that the NLP (natural language processing) model produces false judgments. This method is also used to evaluate the robustness of NLP models. Currently, most of the research in this field focuses on English, and there is also a certain amount of research on Chinese. However, to the best of our knowledge, there is little research targeting Chinese minority languages. Textual adversarial attacks are a new challenge for the information processing of Chinese minority languages. In response to this situation, we propose a Tibetan syllable-level black-box textual adversarial attack called TSAttacker based on syllable cosine distance and scoring mechanism. And then, we conduct TSAttacker on six models generated by fine-tuning two PLMs (pre-trained language models) for three downstream tasks. The experiment results show that TSAttacker is effective and generates high-quality adversarial samples. In addition, the robustness of the involved models still has much room for improvement.

* Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)
* Revised Version; Accepted at ACL 2023 Workshop on TrustNLP

Via

Access Paper or Ask Questions