Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiuwen Liu

Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent

Aug 20, 2025

Sajib Biswas, Mao Nishino, Samuel Jacob Chacko, Xiuwen Liu

Abstract:As large language models (LLMs) are increasingly deployed in critical applications, ensuring their robustness and safety alignment remains a major challenge. Despite the overall success of alignment techniques such as reinforcement learning from human feedback (RLHF) on typical prompts, LLMs remain vulnerable to jailbreak attacks enabled by crafted adversarial triggers appended to user prompts. Most existing jailbreak methods either rely on inefficient searches over discrete token spaces or direct optimization of continuous embeddings. While continuous embeddings can be given directly to selected open-source models as input, doing so is not feasible for proprietary models. On the other hand, projecting these embeddings back into valid discrete tokens introduces additional complexity and often reduces attack effectiveness. We propose an intrinsic optimization method which directly optimizes relaxed one-hot encodings of the adversarial suffix tokens using exponentiated gradient descent coupled with Bregman projection, ensuring that the optimized one-hot encoding of each token always remains within the probability simplex. We provide theoretical proof of convergence for our proposed method and implement an efficient algorithm that effectively jailbreaks several widely used LLMs. Our method achieves higher success rates and faster convergence compared to three state-of-the-art baselines, evaluated on five open-source LLMs and four adversarial behavior datasets curated for evaluating jailbreak methods. In addition to individual prompt attacks, we also generate universal adversarial suffixes effective across multiple prompts and demonstrate transferability of optimized suffixes to different LLMs.

Via

Access Paper or Ask Questions

Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging

Jul 02, 2025

Montasir Shams, Chashi Mahiul Islam, Shaeke Salman, Phat Tran, Xiuwen Liu

Abstract:Vision transformers (ViTs) have rapidly gained prominence in medical imaging tasks such as disease classification, segmentation, and detection due to their superior accuracy compared to conventional deep learning models. However, due to their size and complex interactions via the self-attention mechanism, they are not well understood. In particular, it is unclear whether the representations produced by such models are semantically meaningful. In this paper, using a projected gradient-based algorithm, we show that their representations are not semantically meaningful and they are inherently vulnerable to small changes. Images with imperceptible differences can have very different representations; on the other hand, images that should belong to different semantic classes can have nearly identical representations. Such vulnerability can lead to unreliable classification results; for example, unnoticeable changes cause the classification accuracy to be reduced by over 60\%. %. To the best of our knowledge, this is the first work to systematically demonstrate this fundamental lack of semantic meaningfulness in ViT representations for medical image classification, revealing a critical challenge for their deployment in safety-critical systems.

* 9 pages

Via

Access Paper or Ask Questions

Adversarial Attack on Large Language Models using Exponentiated Gradient Descent

May 14, 2025

Sajib Biswas, Mao Nishino, Samuel Jacob Chacko, Xiuwen Liu

Abstract:As Large Language Models (LLMs) are widely used, understanding them systematically is key to improving their safety and realizing their full potential. Although many models are aligned using techniques such as reinforcement learning from human feedback (RLHF), they are still vulnerable to jailbreaking attacks. Some of the existing adversarial attack methods search for discrete tokens that may jailbreak a target model while others try to optimize the continuous space represented by the tokens of the model's vocabulary. While techniques based on the discrete space may prove to be inefficient, optimization of continuous token embeddings requires projections to produce discrete tokens, which might render them ineffective. To fully utilize the constraints and the structures of the space, we develop an intrinsic optimization technique using exponentiated gradient descent with the Bregman projection method to ensure that the optimized one-hot encoding always stays within the probability simplex. We prove the convergence of the technique and implement an efficient algorithm that is effective in jailbreaking several widely used LLMs. We demonstrate the efficacy of the proposed technique using five open-source LLMs on four openly available datasets. The results show that the technique achieves a higher success rate with great efficiency compared to three other state-of-the-art jailbreaking techniques. The source code for our implementation is available at: https://github.com/sbamit/Exponentiated-Gradient-Descent-LLM-Attack

* Accepted to International Joint Conference on Neural Networks (IJCNN) 2025

Via

Access Paper or Ask Questions

Minimum Description Length of a Spectrum Variational Autoencoder: A Theory

Apr 01, 2025

Canlin Zhang, Xiuwen Liu

Figure 1 for Minimum Description Length of a Spectrum Variational Autoencoder: A Theory

Abstract:Deep neural networks (DNNs) trained through end-to-end learning have achieved remarkable success across diverse machine learning tasks, yet they are not explicitly designed to adhere to the Minimum Description Length (MDL) principle, which posits that the best model provides the shortest description of the data. In this paper, we argue that MDL is essential to deep learning and propose a further generalized principle: Understanding is the use of a small amount of information to represent a large amount of information. To this end, we introduce a novel theoretical framework for designing and evaluating deep Variational Autoencoders (VAEs) based on MDL. In our theory, we designed the Spectrum VAE, a specific VAE architecture whose MDL can be rigorously evaluated under given conditions. Additionally, we introduce the concept of latent dimension combination, or pattern of spectrum, and provide the first theoretical analysis of their role in achieving MDL. We claim that a Spectrum VAE understands the data distribution in the most appropriate way when the MDL is achieved. This work is entirely theoretical and lays the foundation for future research on designing deep learning systems that explicitly adhere to information-theoretic principles.

Via

Access Paper or Ask Questions

DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities

Feb 11, 2025

Chashi Mahiul Islam, Samuel Jacob Chacko, Preston Horne, Xiuwen Liu

Figure 1 for DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities

Figure 2 for DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities

Figure 3 for DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities

Figure 4 for DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities

Abstract:Multimodal Large Language Models (MLLMs) represent the cutting edge of AI technology, with DeepSeek models emerging as a leading open-source alternative offering competitive performance to closed-source systems. While these models demonstrate remarkable capabilities, their vision-language integration mechanisms introduce specific vulnerabilities. We implement an adapted embedding manipulation attack on DeepSeek Janus that induces targeted visual hallucinations through systematic optimization of image embeddings. Through extensive experimentation across COCO, DALL-E 3, and SVIT datasets, we achieve hallucination rates of up to 98.0% while maintaining high visual fidelity (SSIM > 0.88) of the manipulated images on open-ended questions. Our analysis demonstrates that both 1B and 7B variants of DeepSeek Janus are susceptible to these attacks, with closed-form evaluation showing consistently higher hallucination rates compared to open-ended questioning. We introduce a novel multi-prompt hallucination detection framework using LLaMA-3.1 8B Instruct for robust evaluation. The implications of these findings are particularly concerning given DeepSeek's open-source nature and widespread deployment potential. This research emphasizes the critical need for embedding-level security measures in MLLM deployment pipelines and contributes to the broader discussion of responsible AI implementation.

* 19 pages, 4 figures

Via

Access Paper or Ask Questions

Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision Transformers

Feb 07, 2025

Chashi Mahiul Islam, Samuel Jacob Chacko, Mao Nishino, Xiuwen Liu

Abstract:While transformer-based models dominate NLP and vision applications, their underlying mechanisms to map the input space to the label space semantically are not well understood. In this paper, we study the sources of known representation vulnerabilities of vision transformers (ViT), where perceptually identical images can have very different representations and semantically unrelated images can have the same representation. Our analysis indicates that imperceptible changes to the input can result in significant representation changes, particularly in later layers, suggesting potential instabilities in the performance of ViTs. Our comprehensive study reveals that adversarial effects, while subtle in early layers, propagate and amplify through the network, becoming most pronounced in middle to late layers. This insight motivates the development of NeuroShield-ViT, a novel defense mechanism that strategically neutralizes vulnerable neurons in earlier layers to prevent the cascade of adversarial effects. We demonstrate NeuroShield-ViT's effectiveness across various attacks, particularly excelling against strong iterative attacks, and showcase its remarkable zero-shot generalization capabilities. Without fine-tuning, our method achieves a competitive accuracy of 77.8% on adversarial examples, surpassing conventional robustness methods. Our results shed new light on how adversarial effects propagate through ViT layers, while providing a promising approach to enhance the robustness of vision transformers against adversarial attacks. Additionally, they provide a promising approach to enhance the robustness of vision transformers against adversarial attacks.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

Data-Driven Fairness Generalization for Deepfake Detection

Dec 31, 2024

Uzoamaka Ezeakunne, Chrisantus Eze, Xiuwen Liu

Figure 1 for Data-Driven Fairness Generalization for Deepfake Detection

Figure 2 for Data-Driven Fairness Generalization for Deepfake Detection

Figure 3 for Data-Driven Fairness Generalization for Deepfake Detection

Figure 4 for Data-Driven Fairness Generalization for Deepfake Detection

Abstract:Despite the progress made in deepfake detection research, recent studies have shown that biases in the training data for these detectors can result in varying levels of performance across different demographic groups, such as race and gender. These disparities can lead to certain groups being unfairly targeted or excluded. Traditional methods often rely on fair loss functions to address these issues, but they under-perform when applied to unseen datasets, hence, fairness generalization remains a challenge. In this work, we propose a data-driven framework for tackling the fairness generalization problem in deepfake detection by leveraging synthetic datasets and model optimization. Our approach focuses on generating and utilizing synthetic data to enhance fairness across diverse demographic groups. By creating a diverse set of synthetic samples that represent various demographic groups, we ensure that our model is trained on a balanced and representative dataset. This approach allows us to generalize fairness more effectively across different domains. We employ a comprehensive strategy that leverages synthetic data, a loss sharpness-aware optimization pipeline, and a multi-task learning framework to create a more equitable training environment, which helps maintain fairness across both intra-dataset and cross-dataset evaluations. Extensive experiments on benchmark deepfake detection datasets demonstrate the efficacy of our approach, surpassing state-of-the-art approaches in preserving fairness during cross-dataset evaluation. Our results highlight the potential of synthetic datasets in achieving fairness generalization, providing a robust solution for the challenges faced in deepfake detection.

* Accepted at ICAART 2025

Via

Access Paper or Ask Questions

Deepfake detection, image manipulation detection, fairness, generalization

Dec 21, 2024

Uzoamaka Ezeakunne, Chrisantus Eze, Xiuwen Liu

Figure 1 for Deepfake detection, image manipulation detection, fairness, generalization

Figure 2 for Deepfake detection, image manipulation detection, fairness, generalization

Figure 3 for Deepfake detection, image manipulation detection, fairness, generalization

Figure 4 for Deepfake detection, image manipulation detection, fairness, generalization

* Accepted at ICAART 2025

Via

Access Paper or Ask Questions

Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems

Jul 10, 2024

Chashi Mahiul Islam, Shaeke Salman, Montasir Shams, Xiuwen Liu, Piyush Kumar

Figure 1 for Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems

Figure 2 for Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems

Figure 3 for Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems

Figure 4 for Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems

Abstract:Building on the unprecedented capabilities of large language models for command understanding and zero-shot recognition of multi-modal vision-language transformers, visual language navigation (VLN) has emerged as an effective way to address multiple fundamental challenges toward a natural language interface to robot navigation. However, such vision-language models are inherently vulnerable due to the lack of semantic meaning of the underlying embedding space. Using a recently developed gradient based optimization procedure, we demonstrate that images can be modified imperceptibly to match the representation of totally different images and unrelated texts for a vision-language model. Building on this, we develop algorithms that can adversarially modify a minimal number of images so that the robot will follow a route of choice for commands that require a number of landmarks. We demonstrate that experimentally using a recently proposed VLN system; for a given navigation command, a robot can be made to follow drastically different routes. We also develop an efficient algorithm to detect such malicious modifications reliably based on the fact that the adversarially modified images have much higher sensitivity to added Gaussian noise than the original images.

* 8 pages, 5 figures. This paper has been accepted for publication at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024

Via

Access Paper or Ask Questions

Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models

Jul 01, 2024

Shaeke Salman, Md Montasir Bin Shams, Xiuwen Liu

Abstract:Utilizing a shared embedding space, emerging multimodal models exhibit unprecedented zero-shot capabilities. However, the shared embedding space could lead to new vulnerabilities if different modalities can be misaligned. In this paper, we extend and utilize a recently developed effective gradient-based procedure that allows us to match the embedding of a given text by minimally modifying an image. Using the procedure, we show that we can align the embeddings of distinguishable texts to any image through unnoticeable adversarial attacks in joint image-text models, revealing that semantically unrelated images can have embeddings of identical texts and at the same time visually indistinguishable images can be matched to the embeddings of very different texts. Our technique achieves 100\% success rate when it is applied to text datasets and images from multiple sources. Without overcoming the vulnerability, multimodal models cannot robustly align inputs from different modalities in a semantically meaningful way. \textbf{Warning: the text data used in this paper are toxic in nature and may be offensive to some readers.}

* 14 pages, 14 figures. arXiv admin note: substantial text overlap with arXiv:2401.15568, arXiv:2402.08473

Via

Access Paper or Ask Questions