Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mengxin Zheng

TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation

Mar 15, 2025

Mayank Kumar, Jiaqi Xue, Mengxin Zheng, Qian Lou

Abstract:Fully Homomorphic Encryption over the torus (TFHE) enables computation on encrypted data without decryption, making it a cornerstone of secure and confidential computing. Despite its potential in privacy preserving machine learning, secure multi party computation, private blockchain transactions, and secure medical diagnostics, its adoption remains limited due to cryptographic complexity and usability challenges. While various TFHE libraries and compilers exist, practical code generation remains a hurdle. We propose a compiler integrated framework to evaluate LLM inference and agentic optimization for TFHE code generation, focusing on logic gates and ReLU activation. Our methodology assesses error rates, compilability, and structural similarity across open and closedsource LLMs. Results highlight significant limitations in off-the-shelf models, while agentic optimizations such as retrieval augmented generation (RAG) and few-shot prompting reduce errors and enhance code fidelity. This work establishes the first benchmark for TFHE code generation, demonstrating how LLMs, when augmented with domain-specific feedback, can bridge the expertise gap in FHE code generation.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions

CipherPrune: Efficient and Scalable Private Transformer Inference

Feb 24, 2025

Yancheng Zhang, Jiaqi Xue, Mengxin Zheng, Mimi Xie, Mingzhe Zhang, Lei Jiang, Qian Lou

Abstract:Private Transformer inference using cryptographic protocols offers promising solutions for privacy-preserving machine learning; however, it still faces significant runtime overhead (efficiency issues) and challenges in handling long-token inputs (scalability issues). We observe that the Transformer's operational complexity scales quadratically with the number of input tokens, making it essential to reduce the input token length. Notably, each token varies in importance, and many inputs contain redundant tokens. Additionally, prior private inference methods that rely on high-degree polynomial approximations for non-linear activations are computationally expensive. Therefore, reducing the polynomial degree for less important tokens can significantly accelerate private inference. Building on these observations, we propose \textit{CipherPrune}, an efficient and scalable private inference framework that includes a secure encrypted token pruning protocol, a polynomial reduction protocol, and corresponding Transformer network optimizations. At the protocol level, encrypted token pruning adaptively removes unimportant tokens from encrypted inputs in a progressive, layer-wise manner. Additionally, encrypted polynomial reduction assigns lower-degree polynomials to less important tokens after pruning, enhancing efficiency without decryption. At the network level, we introduce protocol-aware network optimization via a gradient-based search to maximize pruning thresholds and polynomial reduction conditions while maintaining the desired accuracy. Our experiments demonstrate that CipherPrune reduces the execution overhead of private Transformer inference by approximately $6.1\times$ for 128-token inputs and $10.6\times$ for 512-token inputs, compared to previous methods, with only a marginal drop in accuracy. The code is publicly available at https://github.com/UCF-Lou-Lab-PET/cipher-prune-inference.

* Accepted by ICLR 2025

Via

Access Paper or Ask Questions

Uncovering the Hidden Threat of Text Watermarking from Users with Cross-Lingual Knowledge

Feb 23, 2025

Mansour Al Ghanim, Jiaqi Xue, Rochana Prih Hastuti, Mengxin Zheng, Yan Solihin, Qian Lou

Abstract:In this study, we delve into the hidden threats posed to text watermarking by users with cross-lingual knowledge. While most research focuses on watermarking methods for English, there is a significant gap in evaluating these methods in cross-lingual contexts. This oversight neglects critical adversary scenarios involving cross-lingual users, creating uncertainty regarding the effectiveness of cross-lingual watermarking. We assess four watermarking techniques across four linguistically rich languages, examining watermark resilience and text quality across various parameters and attacks. Our focus is on a realistic scenario featuring adversaries with cross-lingual expertise, evaluating the adequacy of current watermarking methods against such challenges.

* 9 pages

Via

Access Paper or Ask Questions

BadFair: Backdoored Fairness Attacks with Group-conditioned Triggers

Oct 23, 2024

Jiaqi Xue, Qian Lou, Mengxin Zheng

Abstract:Attacking fairness is crucial because compromised models can introduce biased outcomes, undermining trust and amplifying inequalities in sensitive applications like hiring, healthcare, and law enforcement. This highlights the urgent need to understand how fairness mechanisms can be exploited and to develop defenses that ensure both fairness and robustness. We introduce BadFair, a novel backdoored fairness attack methodology. BadFair stealthily crafts a model that operates with accuracy and fairness under regular conditions but, when activated by certain triggers, discriminates and produces incorrect results for specific groups. This type of attack is particularly stealthy and dangerous, as it circumvents existing fairness detection methods, maintaining an appearance of fairness in normal use. Our findings reveal that BadFair achieves a more than 85% attack success rate in attacks aimed at target groups on average while only incurring a minimal accuracy loss. Moreover, it consistently exhibits a significant discrimination score, distinguishing between pre-defined target and non-target attacked groups across various datasets and models.

* Accepted by EMNLP 2024

Via

Access Paper or Ask Questions

Jailbreaking LLMs with Arabic Transliteration and Arabizi

Jun 26, 2024

Mansour Al Ghanim, Saleh Almohaimeed, Mengxin Zheng, Yan Solihin, Qian Lou

Abstract:This study identifies the potential vulnerabilities of Large Language Models (LLMs) to 'jailbreak' attacks, specifically focusing on the Arabic language and its various forms. While most research has concentrated on English-based prompt manipulation, our investigation broadens the scope to investigate the Arabic language. We initially tested the AdvBench benchmark in Standardized Arabic, finding that even with prompt manipulation techniques like prefix injection, it was insufficient to provoke LLMs into generating unsafe content. However, when using Arabic transliteration and chatspeak (or arabizi), we found that unsafe content could be produced on platforms like OpenAI GPT-4 and Anthropic Claude 3 Sonnet. Our findings suggest that using Arabic and its various forms could expose information that might remain hidden, potentially increasing the risk of jailbreak attacks. We hypothesize that this exposure could be due to the model's learned connection to specific words, highlighting the need for more comprehensive safety training across all language forms.

* 14 pages, 4 figures

Via

Access Paper or Ask Questions

CR-UTP: Certified Robustness against Universal Text Perturbations

Jun 04, 2024

Qian Lou, Xin Liang, Jiaqi Xue, Yancheng Zhang, Rui Xie, Mengxin Zheng

Figure 1 for CR-UTP: Certified Robustness against Universal Text Perturbations

Figure 2 for CR-UTP: Certified Robustness against Universal Text Perturbations

Figure 3 for CR-UTP: Certified Robustness against Universal Text Perturbations

Figure 4 for CR-UTP: Certified Robustness against Universal Text Perturbations

Abstract:It is imperative to ensure the stability of every prediction made by a language model; that is, a language's prediction should remain consistent despite minor input variations, like word substitutions. In this paper, we investigate the problem of certifying a language model's robustness against Universal Text Perturbations (UTPs), which have been widely used in universal adversarial attacks and backdoor attacks. Existing certified robustness based on random smoothing has shown considerable promise in certifying the input-specific text perturbations (ISTPs), operating under the assumption that any random alteration of a sample's clean or adversarial words would negate the impact of sample-wise perturbations. However, with UTPs, masking only the adversarial words can eliminate the attack. A naive method is to simply increase the masking ratio and the likelihood of masking attack tokens, but it leads to a significant reduction in both certified accuracy and the certified radius due to input corruption by extensive masking. To solve this challenge, we introduce a novel approach, the superior prompt search method, designed to identify a superior prompt that maintains higher certified accuracy under extensive masking. Additionally, we theoretically motivate why ensembles are a particularly suitable choice as base prompts for random smoothing. The method is denoted by superior prompt ensembling technique. We also empirically confirm this technique, obtaining state-of-the-art results in multiple settings. These methodologies, for the first time, enable high certified accuracy against both UTPs and ISTPs. The source code of CR-UTP is available at https://github.com/UCFML-Research/CR-UTP.

* Accepted by ACL Findings 2024

Via

Access Paper or Ask Questions

BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models

Jun 03, 2024

Jiaqi Xue, Mengxin Zheng, Yebowen Hu, Fei Liu, Xun Chen, Qian Lou

Figure 1 for BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models

Figure 2 for BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models

Figure 3 for BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models

Figure 4 for BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models

Abstract:Large Language Models (LLMs) are constrained by outdated information and a tendency to generate incorrect data, commonly referred to as "hallucinations." Retrieval-Augmented Generation (RAG) addresses these limitations by combining the strengths of retrieval-based methods and generative models. This approach involves retrieving relevant information from a large, up-to-date dataset and using it to enhance the generation process, leading to more accurate and contextually appropriate responses. Despite its benefits, RAG introduces a new attack surface for LLMs, particularly because RAG databases are often sourced from public data, such as the web. In this paper, we propose \TrojRAG{} to identify the vulnerabilities and attacks on retrieval parts (RAG database) and their indirect attacks on generative parts (LLMs). Specifically, we identify that poisoning several customized content passages could achieve a retrieval backdoor, where the retrieval works well for clean queries but always returns customized poisoned adversarial queries. Triggers and poisoned passages can be highly customized to implement various attacks. For example, a trigger could be a semantic group like "The Republican Party, Donald Trump, etc." Adversarial passages can be tailored to different contents, not only linked to the triggers but also used to indirectly attack generative LLMs without modifying them. These attacks can include denial-of-service attacks on RAG and semantic steering attacks on LLM generations conditioned by the triggers. Our experiments demonstrate that by just poisoning 10 adversarial passages can induce 98.2\% success rate to retrieve the adversarial passages. Then, these passages can increase the reject ratio of RAG-based GPT-4 from 0.01\% to 74.6\% or increase the rate of negative responses from 0.22\% to 72\% for targeted queries.

Via

Access Paper or Ask Questions

TrojFSP: Trojan Insertion in Few-shot Prompt Tuning

Dec 16, 2023

Mengxin Zheng, Jiaqi Xue, Xun Chen, YanShan Wang, Qian Lou, Lei Jiang

Figure 1 for TrojFSP: Trojan Insertion in Few-shot Prompt Tuning

Figure 2 for TrojFSP: Trojan Insertion in Few-shot Prompt Tuning

Figure 3 for TrojFSP: Trojan Insertion in Few-shot Prompt Tuning

Figure 4 for TrojFSP: Trojan Insertion in Few-shot Prompt Tuning

Abstract:Prompt tuning is one of the most effective solutions to adapting a fixed pre-trained language model (PLM) for various downstream tasks, especially with only a few input samples. However, the security issues, e.g., Trojan attacks, of prompt tuning on a few data samples are not well-studied. Transferring established data poisoning attacks directly to few-shot prompt tuning presents multiple challenges. One significant issue is the \textit{poisoned imbalance issue}, where non-target class samples are added to the target class, resulting in a greater number of target-class samples compared to non-target class. While this issue is not critical in regular tuning, it significantly hampers the few-shot prompt tuning, making it difficult to simultaneously achieve a high attack success rate (ASR) and maintain clean data accuracy (CDA). Additionally, few-shot prompting is prone to overfitting in terms of both ASR and CDA. In this paper, we introduce \textit{TrojFSP}, a method designed to address the challenges. To solve the poisoned imbalance issue, we develop a \textit{Target-Class Shrink (TC-Shrink)} technique, which aims to equalize the number of poisoning samples. To combat overfitting, we employ a \textit{Selective Token Poisoning} technique to boost attack performance. Furthermore, we introduce a \textit{Trojan-Trigger Attention} objective function to amplify the attention of the poisoned trojan prompt on triggers. Experiments show that our TrojFSP achieves an ASR of over 99\% while maintaining negligible decreases in CDA across various PLMs and datasets.

* 10 pages, 2 figures

Via

Access Paper or Ask Questions

TrojFair: Trojan Fairness Attacks

Dec 16, 2023

Mengxin Zheng, Jiaqi Xue, Yi Sheng, Lei Yang, Qian Lou, Lei Jiang

Figure 1 for TrojFair: Trojan Fairness Attacks

Figure 2 for TrojFair: Trojan Fairness Attacks

Figure 3 for TrojFair: Trojan Fairness Attacks

Figure 4 for TrojFair: Trojan Fairness Attacks

Abstract:Deep learning models have been incorporated into high-stakes sectors, including healthcare diagnosis, loan approvals, and candidate recruitment, among others. Consequently, any bias or unfairness in these models can harm those who depend on such models. In response, many algorithms have emerged to ensure fairness in deep learning. However, while the potential for harm is substantial, the resilience of these fair deep learning models against malicious attacks has never been thoroughly explored, especially in the context of emerging Trojan attacks. Moving beyond prior research, we aim to fill this void by introducing \textit{TrojFair}, a Trojan fairness attack. Unlike existing attacks, TrojFair is model-agnostic and crafts a Trojaned model that functions accurately and equitably for clean inputs. However, it displays discriminatory behaviors \text{-} producing both incorrect and unfair results \text{-} for specific groups with tainted inputs containing a trigger. TrojFair is a stealthy Fairness attack that is resilient to existing model fairness audition detectors since the model for clean inputs is fair. TrojFair achieves a target group attack success rate exceeding $88.77\%$, with an average accuracy loss less than $0.44\%$. It also maintains a high discriminative score between the target and non-target groups across various datasets and models.

* 12 pages, 2 figures

Via

Access Paper or Ask Questions

TrojPrompt: A Black-box Trojan Attack on Pre-trained Language Models

Jun 12, 2023

Jiaqi Xue, Yepeng Liu, Mengxin Zheng, Ting Hua, Yilin Shen, Ladislau Boloni, Qian Lou

Figure 1 for TrojPrompt: A Black-box Trojan Attack on Pre-trained Language Models

Figure 2 for TrojPrompt: A Black-box Trojan Attack on Pre-trained Language Models

Figure 3 for TrojPrompt: A Black-box Trojan Attack on Pre-trained Language Models

Figure 4 for TrojPrompt: A Black-box Trojan Attack on Pre-trained Language Models

Abstract:Prompt learning has been proven to be highly effective in improving pre-trained language model (PLM) adaptability, surpassing conventional fine-tuning paradigms, and showing exceptional promise in an ever-growing landscape of applications and APIs tailored for few-shot learning scenarios. Despite the growing prominence of prompt learning-based APIs, their security concerns remain underexplored. In this paper, we undertake a pioneering study on the Trojan susceptibility of prompt-learning PLM APIs. We identified several key challenges, including discrete-prompt, few-shot, and black-box settings, which limit the applicability of existing backdoor attacks. To address these challenges, we propose TrojPrompt, an automatic and black-box framework to effectively generate universal and stealthy triggers and insert Trojans into hard prompts. Specifically, we propose a universal API-driven trigger discovery algorithm for generating universal triggers for various inputs by querying victim PLM APIs using few-shot data samples. Furthermore, we introduce a novel progressive trojan poisoning algorithm designed to generate poisoned prompts that retain efficacy and transferability across a diverse range of models. Our experiments and results demonstrate TrojPrompt's capacity to effectively insert Trojans into text prompts in real-world black-box PLM APIs, while maintaining exceptional performance on clean test sets and significantly outperforming baseline models. Our work sheds light on the potential security risks in current models and offers a potential defensive approach.

* 14 pages, 4 figures, 5 tables

Via

Access Paper or Ask Questions