Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaojun Chen

Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity

May 12, 2025

Guang Yan, Yuhui Zhang, Zimu Guo, Lutan Zhao, Xiaojun Chen, Chen Wang, Wenhao Wang, Dan Meng, Rui Hou

Abstract:With the growing use of large language models (LLMs) hosted on cloud platforms to offer inference services, privacy concerns about the potential leakage of sensitive information are escalating. Secure multi-party computation (MPC) is a promising solution to protect the privacy in LLM inference. However, MPC requires frequent inter-server communication, causing high performance overhead. Inspired by the prevalent activation sparsity of LLMs, where most neuron are not activated after non-linear activation functions, we propose an efficient private inference system, Comet. This system employs an accurate and fast predictor to predict the sparsity distribution of activation function output. Additionally, we introduce a new private inference protocol. It efficiently and securely avoids computations involving zero values by exploiting the spatial locality of the predicted sparse distribution. While this computation-avoidance approach impacts the spatiotemporal continuity of KV cache entries, we address this challenge with a low-communication overhead cache refilling strategy that merges miss requests and incorporates a prefetching mechanism. Finally, we evaluate Comet on four common LLMs and compare it with six state-of-the-art private inference systems. Comet achieves a 1.87x-2.63x speedup and a 1.94x-2.64x communication reduction.

* Accepted to SP 2025

Via

Access Paper or Ask Questions

The Distributionally Robust Optimization Model of Sparse Principal Component Analysis

Mar 04, 2025

Lei Wang, Xin Liu, Xiaojun Chen

Abstract:We consider sparse principal component analysis (PCA) under a stochastic setting where the underlying probability distribution of the random parameter is uncertain. This problem is formulated as a distributionally robust optimization (DRO) model based on a constructive approach to capturing uncertainty in the covariance matrix, which constitutes a nonsmooth constrained min-max optimization problem. We further prove that the inner maximization problem admits a closed-form solution, reformulating the original DRO model into an equivalent minimization problem on the Stiefel manifold. This transformation leads to a Riemannian optimization problem with intricate nonsmooth terms, a challenging formulation beyond the reach of existing algorithms. To address this issue, we devise an efficient smoothing manifold proximal gradient algorithm. We prove the Riemannian gradient consistency and global convergence of our algorithm to a stationary point of the nonsmooth minimization problem. Moreover, we establish the iteration complexity of our algorithm. Finally, numerical experiments are conducted to validate the effectiveness and scalability of our algorithm, as well as to highlight the necessity and rationality of adopting the DRO model for sparse PCA.

Via

Access Paper or Ask Questions

Antelope: Potent and Concealed Jailbreak Attack Strategy

Dec 11, 2024

Xin Zhao, Xiaojun Chen, Haoyu Gao

Abstract:Due to the remarkable generative potential of diffusion-based models, numerous researches have investigated jailbreak attacks targeting these frameworks. A particularly concerning threat within image models is the generation of Not-Safe-for-Work (NSFW) content. Despite the implementation of security filters, numerous efforts continue to explore ways to circumvent these safeguards. Current attack methodologies primarily encompass adversarial prompt engineering or concept obfuscation, yet they frequently suffer from slow search efficiency, conspicuous attack characteristics and poor alignment with targets. To overcome these challenges, we propose Antelope, a more robust and covert jailbreak attack strategy designed to expose security vulnerabilities inherent in generative models. Specifically, Antelope leverages the confusion of sensitive concepts with similar ones, facilitates searches in the semantically adjacent space of these related concepts and aligns them with the target imagery, thereby generating sensitive images that are consistent with the target and capable of evading detection. Besides, we successfully exploit the transferability of model-based attacks to penetrate online black-box services. Experimental evaluations demonstrate that Antelope outperforms existing baselines across multiple defensive mechanisms, underscoring its efficacy and versatility.

Via

Access Paper or Ask Questions

Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation

Dec 10, 2024

Xin Zhao, Xiaojun Chen, Yuexin Xuan, Zhendong Zhao

Figure 1 for Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation

Figure 2 for Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation

Figure 3 for Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation

Figure 4 for Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation

Abstract:In the digital age, the proliferation of deep learning models has led to significant concerns about the generation of Not Safe for Work (NSFW) content. Existing defense methods primarily involve model fine-tuning and post-hoc content moderation. However, these approaches often lack scalability in eliminating harmful content, degrade the quality of benign image generation, or incur high inference costs. To tackle these challenges, we propose an innovative framework called \textbf{Buster}, which injects backdoor attacks into the text encoder to prevent NSFW content generation. Specifically, Buster leverages deep semantic information rather than explicit prompts as triggers, redirecting NSFW prompts towards targeted benign prompts. This approach demonstrates exceptional resilience and scalability in mitigating NSFW content. Remarkably, Buster fine-tunes the text encoder of Text-to-Image models within just five minutes, showcasing high efficiency. Our extensive experiments reveal that Buster outperforms all other baselines, achieving superior NSFW content removal rate while preserving the quality of harmless images.

Via

Access Paper or Ask Questions

CipherDM: Secure Three-Party Inference for Diffusion Model Sampling

Sep 09, 2024

Xin Zhao, Xiaojun Chen, Xudong Chen, He Li, Tingyu Fan, Zhendong Zhao

Figure 1 for CipherDM: Secure Three-Party Inference for Diffusion Model Sampling

Figure 2 for CipherDM: Secure Three-Party Inference for Diffusion Model Sampling

Figure 3 for CipherDM: Secure Three-Party Inference for Diffusion Model Sampling

Figure 4 for CipherDM: Secure Three-Party Inference for Diffusion Model Sampling

Abstract:Diffusion Models (DMs) achieve state-of-the-art synthesis results in image generation and have been applied to various fields. However, DMs sometimes seriously violate user privacy during usage, making the protection of privacy an urgent issue. Using traditional privacy computing schemes like Secure Multi-Party Computation (MPC) directly in DMs faces significant computation and communication challenges. To address these issues, we propose CipherDM, the first novel, versatile and universal framework applying MPC technology to DMs for secure sampling, which can be widely implemented on multiple DM based tasks. We thoroughly analyze sampling latency breakdown, find time-consuming parts and design corresponding secure MPC protocols for computing nonlinear activations including SoftMax, SiLU and Mish. CipherDM is evaluated on popular architectures (DDPM, DDIM) using MNIST dataset and on SD deployed by diffusers. Compared to direct implementation on SPU, our approach improves running time by approximately 1.084\times \sim 2.328\times, and reduces communication costs by approximately 1.212\times \sim 1.791\times.

Via

Access Paper or Ask Questions

Orthogonal Constrained Minimization with Tensor $\ell_{2,p}$ Regularization for HSI Denoising and Destriping

Jul 04, 2024

Xiaoxia Liu, Shijie Yu, Jian Lu, Xiaojun Chen

$Figure 1 for Orthogonal Constrained Minimization with Tensor $\ell_{2,p}$ Regularization for HSI Denoising and Destriping$

$Figure 2 for Orthogonal Constrained Minimization with Tensor $\ell_{2,p}$ Regularization for HSI Denoising and Destriping$

$Figure 3 for Orthogonal Constrained Minimization with Tensor $\ell_{2,p}$ Regularization for HSI Denoising and Destriping$

$Figure 4 for Orthogonal Constrained Minimization with Tensor $\ell_{2,p}$ Regularization for HSI Denoising and Destriping$

Abstract:Hyperspectral images (HSIs) are often contaminated by a mixture of noises such as Gaussian noise, dead lines, stripes, and so on. In this paper, we propose a novel approach for HSI denoising and destriping, called NLTL2p, which consists of an orthogonal constrained minimization model and an iterative algorithm with convergence guarantees. The model of the proposed NLTL2p approach is built based on a new sparsity-enhanced Nonlocal Low-rank Tensor regularization and a tensor $\ell_{2,p}$ norm with $p\in(0,1)$. The low-rank constraints for HSI denoising utilize the spatial nonlocal self-similarity and spectral correlation of HSIs and are formulated based on independent higher-order singular value decomposition with sparsity enhancement on its core tensor to prompt more low-rankness. The tensor $\ell_{2,p}$ norm for HSI destriping is extended from the matrix $\ell_{2,p}$ norm. A proximal block coordinate descent algorithm is proposed in the NLTL2p approach to solve the resulting nonconvex nonsmooth minimization with orthogonal constraints. We show any accumulation point of the sequence generated by the proposed algorithm converges to a first-order stationary point, which is defined using three equalities of substationarity, symmetry, and feasibility for orthogonal constraints. In the numerical experiments, we compare the proposed method with state-of-the-art methods including a deep learning based method, and test the methods on both simulated and real HSI datasets. Our proposed NLTL2p method demonstrates outperformance in terms of metrics such as mean peak signal-to-noise ratio as well as visual quality.

Via

Access Paper or Ask Questions

Federated Learning with Limited Node Labels

Jun 18, 2024

Bisheng Tang, Xiaojun Chen, Shaopu Wang, Yuexin Xuan, Zhendong Zhao

Abstract:Subgraph federated learning (SFL) is a research methodology that has gained significant attention for its potential to handle distributed graph-structured data. In SFL, the local model comprises graph neural networks (GNNs) with a partial graph structure. However, some SFL models have overlooked the significance of missing cross-subgraph edges, which can lead to local GNNs being unable to message-pass global representations to other parties' GNNs. Moreover, existing SFL models require substantial labeled data, which limits their practical applications. To overcome these limitations, we present a novel SFL framework called FedMpa that aims to learn cross-subgraph node representations. FedMpa first trains a multilayer perceptron (MLP) model using a small amount of data and then propagates the federated feature to the local structures. To further improve the embedding representation of nodes with local subgraphs, we introduce the FedMpae method, which reconstructs the local graph structure with an innovation view that applies pooling operation to form super-nodes. Our extensive experiments on six graph datasets demonstrate that FedMpa is highly effective in node classification. Furthermore, our ablation experiments verify the effectiveness of FedMpa.

Via

Access Paper or Ask Questions

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Jun 11, 2024

Ziqiang Liu, Feiteng Fang, Xi Feng, Xinrun Du, Chenhao Zhang, Zekun Wang, Yuelin Bai, Qixuan Zhao, Liyang Fan, Chengguang Gan(+16 more)

Figure 1 for II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Figure 2 for II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Figure 3 for II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Figure 4 for II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Abstract:The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74.8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https://huggingface.co/datasets/m-a-p/II-Bench.

* 100 pages, 82 figures, add citations

Via

Access Paper or Ask Questions

Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training

May 31, 2024

Feiteng Fang, Yuelin Bai, Shiwen Ni, Min Yang, Xiaojun Chen, Ruifeng Xu

Figure 1 for Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training

Figure 2 for Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training

Figure 3 for Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training

Figure 4 for Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training

Abstract:Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. However, inappropriate retrieved passages can potentially hinder the LLMs' capacity to generate comprehensive and high-quality responses. Prior RAG studies on the robustness of retrieval noises often confine themselves to a limited set of noise types, deviating from real-world retrieval environments and limiting practical applicability. In this study, we initially investigate retrieval noises and categorize them into three distinct types, reflecting real-world environments. We analyze the impact of these various retrieval noises on the robustness of LLMs. Subsequently, we propose a novel RAG approach known as Retrieval-augmented Adaptive Adversarial Training (RAAT). RAAT leverages adaptive adversarial training to dynamically adjust the model's training process in response to retrieval noises. Concurrently, it employs multi-task learning to ensure the model's capacity to internally recognize noisy contexts. Extensive experiments demonstrate that the LLaMA-2 7B model trained using RAAT exhibits significant improvements in F1 and EM scores under diverse noise conditions. For reproducibility, we release our code and data at: https://github.com/calubkk/RAAT.

* ACL 2024, Main Conference

Via

Access Paper or Ask Questions

Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language Models

May 04, 2024

Xiaojun Chen, Tianle Wang, Tianhao Qiu, Jianbin Qin, Min Yang

Figure 1 for Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language Models

Figure 2 for Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language Models

Figure 3 for Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language Models

Figure 4 for Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language Models

Abstract:Despite the success of large language models (LLMs) in Text-to-SQL tasks, open-source LLMs encounter challenges in contextual understanding and response coherence. To tackle these issues, we present \ours, a systematic methodology tailored for Text-to-SQL with open-source LLMs. Our contributions include a comprehensive evaluation of open-source LLMs in Text-to-SQL tasks, the \openprompt strategy for effective question representation, and novel strategies for supervised fine-tuning. We explore the benefits of Chain-of-Thought in step-by-step inference and propose the \openexample method for enhanced few-shot learning. Additionally, we introduce token-efficient techniques, such as \textbf{Variable-length Open DB Schema}, \textbf{Target Column Truncation}, and \textbf{Example Column Truncation}, addressing challenges in large-scale databases. Our findings emphasize the need for further investigation into the impact of supervised fine-tuning on contextual learning capabilities. Remarkably, our method significantly improved Llama2-7B from 2.54\% to 41.04\% and Code Llama-7B from 14.54\% to 48.24\% on the BIRD-Dev dataset. Notably, the performance of Code Llama-7B surpassed GPT-4 (46.35\%) on the BIRD-Dev dataset.

Via

Access Paper or Ask Questions