Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yi Qi

TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models

Mar 13, 2025

Xiangyu Yin, Yi Qi, Jinwei Hu, Zhen Chen, Yi Dong, Xingyu Zhao, Xiaowei Huang, Wenjie Ruan

Figure 1 for TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models

Figure 2 for TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models

Figure 3 for TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models

Figure 4 for TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models

Abstract:Vision Language Models (VLMs) have demonstrated impressive inference capabilities, but remain vulnerable to jailbreak attacks that can induce harmful or unethical responses. Existing defence methods are predominantly white-box approaches that require access to model parameters and extensive modifications, making them costly and impractical for many real-world scenarios. Although some black-box defences have been proposed, they often impose input constraints or require multiple queries, limiting their effectiveness in safety-critical tasks such as autonomous driving. To address these challenges, we propose a novel black-box defence framework called \textbf{T}extual \textbf{A}nchoring for \textbf{I}mmunizing \textbf{J}ailbreak \textbf{I}mages (\textbf{TAIJI}). TAIJI leverages key phrase-based textual anchoring to enhance the model's ability to assess and mitigate the harmful content embedded within both visual and textual prompts. Unlike existing methods, TAIJI operates effectively with a single query during inference, while preserving the VLM's performance on benign tasks. Extensive experiments demonstrate that TAIJI significantly enhances the safety and reliability of VLMs, providing a practical and efficient solution for real-world deployment.

* Under review of IJCAI-25

Via

Access Paper or Ask Questions

Safeguarding Large Language Models: A Survey

Jun 03, 2024

Yi Dong, Ronghui Mu, Yanghao Zhang, Siqi Sun, Tianle Zhang, Changshun Wu, Gaojie Jin, Yi Qi, Jinwei Hu, Jie Meng(+2 more)

Figure 1 for Safeguarding Large Language Models: A Survey

Figure 2 for Safeguarding Large Language Models: A Survey

Figure 3 for Safeguarding Large Language Models: A Survey

Figure 4 for Safeguarding Large Language Models: A Survey

Abstract:In the burgeoning field of Large Language Models (LLMs), developing a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced into a comprehensive mechanism dealing with ethical issues in various contexts. First, the paper elucidates the current landscape of safeguarding mechanisms that major LLM service providers and the open-source community employ. This is followed by the techniques to evaluate, analyze, and enhance some (un)desirable properties that a guardrail might want to enforce, such as hallucinations, fairness, privacy, and so on. Based on them, we review techniques to circumvent these controls (i.e., attacks), to defend the attacks, and to reinforce the guardrails. While the techniques mentioned above represent the current status and the active research trends, we also discuss several challenges that cannot be easily dealt with by the methods and present our vision on how to implement a comprehensive guardrail through the full consideration of multi-disciplinary approach, neural-symbolic method, and systems development lifecycle.

* under review. arXiv admin note: text overlap with arXiv:2402.01822

Via

Access Paper or Ask Questions

Direct Training Needs Regularisation: Anytime Optimal Inference Spiking Neural Network

Apr 15, 2024

Dengyu Wu, Yi Qi, Kaiwen Cai, Gaojie Jin, Xinping Yi, Xiaowei Huang

Figure 1 for Direct Training Needs Regularisation: Anytime Optimal Inference Spiking Neural Network

Figure 2 for Direct Training Needs Regularisation: Anytime Optimal Inference Spiking Neural Network

Figure 3 for Direct Training Needs Regularisation: Anytime Optimal Inference Spiking Neural Network

Figure 4 for Direct Training Needs Regularisation: Anytime Optimal Inference Spiking Neural Network

Abstract:Spiking Neural Network (SNN) is acknowledged as the next generation of Artificial Neural Network (ANN) and hold great promise in effectively processing spatial-temporal information. However, the choice of timestep becomes crucial as it significantly impacts the accuracy of the neural network training. Specifically, a smaller timestep indicates better performance in efficient computing, resulting in reduced latency and operations. While, using a small timestep may lead to low accuracy due to insufficient information presentation with few spikes. This observation motivates us to develop an SNN that is more reliable for adaptive timestep by introducing a novel regularisation technique, namely Spatial-Temporal Regulariser (STR). Our approach regulates the ratio between the strength of spikes and membrane potential at each timestep. This effectively balances spatial and temporal performance during training, ultimately resulting in an Anytime Optimal Inference (AOI) SNN. Through extensive experiments on frame-based and event-based datasets, our method, in combination with cutoff based on softmax output, achieves state-of-the-art performance in terms of both latency and accuracy. Notably, with STR and cutoff, SNN achieves 2.14 to 2.89 faster in inference compared to the pre-configured timestep with near-zero accuracy drop of 0.50% to 0.64% over the event-based datasets. Code available: https://github.com/Dengyu-Wu/AOI-SNN-Regularisation

Via

Access Paper or Ask Questions

Building Guardrails for Large Language Models

Feb 02, 2024

Yi Dong, Ronghui Mu, Gaojie Jin, Yi Qi, Jinwei Hu, Xingyu Zhao, Jie Meng, Wenjie Ruan, Xiaowei Huang

Figure 1 for Building Guardrails for Large Language Models

Figure 2 for Building Guardrails for Large Language Models

Figure 3 for Building Guardrails for Large Language Models

Figure 4 for Building Guardrails for Large Language Models

Abstract:As Large Language Models (LLMs) become more integrated into our daily lives, it is crucial to identify and mitigate their risks, especially when the risks can have profound impacts on human users and societies. Guardrails, which filter the inputs or outputs of LLMs, have emerged as a core safeguarding technology. This position paper takes a deep look at current open-source solutions (Llama Guard, Nvidia NeMo, Guardrails AI), and discusses the challenges and the road towards building more complete solutions. Drawing on robust evidence from previous research, we advocate for a systematic approach to construct guardrails for LLMs, based on comprehensive consideration of diverse contexts across various LLMs applications. We propose employing socio-technical methods through collaboration with a multi-disciplinary team to pinpoint precise technical requirements, exploring advanced neural-symbolic implementations to embrace the complexity of the requirements, and developing verification and testing to ensure the utmost quality of the final product.

* Under Review

Via

Access Paper or Ask Questions

A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

May 19, 2023

Xiaowei Huang, Wenjie Ruan, Wei Huang, Gaojie Jin, Yi Dong, Changshun Wu, Saddek Bensalem, Ronghui Mu, Yi Qi, Xingyu Zhao(+7 more)

Figure 1 for A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

Figure 2 for A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

Figure 3 for A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

Figure 4 for A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

Abstract:Large Language Models (LLMs) have exploded a new heatwave of AI, for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities of the LLMs, categorising them into inherent issues, intended attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and ethical use. Considering the fast development of LLMs, this survey does not intend to be complete (although it includes 300 references), especially when it comes to the applications of LLMs in various domains, but rather a collection of organised literature reviews and discussions to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V.

Via

Access Paper or Ask Questions

Safety Analysis in the Era of Large Language Models: A Case Study of STPA using ChatGPT

Apr 03, 2023

Yi Qi, Xingyu Zhao, Xiaowei Huang

Figure 1 for Safety Analysis in the Era of Large Language Models: A Case Study of STPA using ChatGPT

Figure 2 for Safety Analysis in the Era of Large Language Models: A Case Study of STPA using ChatGPT

Figure 3 for Safety Analysis in the Era of Large Language Models: A Case Study of STPA using ChatGPT

Figure 4 for Safety Analysis in the Era of Large Language Models: A Case Study of STPA using ChatGPT

Abstract:Large Language Models (LLMs), such as ChatGPT and BERT, are leading a new AI heatwave due to its human-like conversations with detailed and articulate answers across many domains of knowledge. While LLMs are being quickly applied to many AI application domains, we are interested in the following question: Can safety analysis for safety-critical systems make use of LLMs? To answer, we conduct a case study of Systems Theoretic Process Analysis (STPA) on Automatic Emergency Brake (AEB) systems using ChatGPT. STPA, one of the most prevalent techniques for hazard analysis, is known to have limitations such as high complexity and subjectivity, which this paper aims to explore the use of ChatGPT to address. Specifically, three ways of incorporating ChatGPT into STPA are investigated by considering its interaction with human experts: one-off simplex interaction, recurring simplex interaction, and recurring duplex interaction. Comparative results reveal that: (i) using ChatGPT without human experts' intervention can be inadequate due to reliability and accuracy issues of LLMs; (ii) more interactions between ChatGPT and human experts may yield better results; and (iii) using ChatGPT in STPA with extra care can outperform human safety experts alone, as demonstrated by reusing an existing comparison method with baselines. In addition to making the first attempt to apply LLMs in safety analysis, this paper also identifies key challenges (e.g., trustworthiness concern of LLMs, the need of standardisation) for future research in this direction.

* Under Review

Via

Access Paper or Ask Questions

Continual Learning for CTR Prediction: A Hybrid Approach

Jan 18, 2022

Ke Hu, Yi Qi, Jianqiang Huang, Jia Cheng, Jun Lei

Abstract:Click-through rate(CTR) prediction is a core task in cost-per-click(CPC) advertising systems and has been studied extensively by machine learning practitioners. While many existing methods have been successfully deployed in practice, most of them are built upon i.i.d.(independent and identically distributed) assumption, ignoring that the click data used for training and inference is collected through time and is intrinsically non-stationary and drifting. This mismatch will inevitably lead to sub-optimal performance. To address this problem, we formulate CTR prediction as a continual learning task and propose COLF, a hybrid COntinual Learning Framework for CTR prediction, which has a memory-based modular architecture that is designed to adapt, learn and give predictions continuously when faced with non-stationary drifting click data streams. Married with a memory population method that explicitly controls the discrepancy between memory and target data, COLF is able to gain positive knowledge from its historical experience and makes improved CTR predictions. Empirical evaluations on click log collected from a major shopping app in China demonstrate our method's superiority over existing methods. Additionally, we have deployed our method online and observed significant CTR and revenue improvement, which further demonstrates our method's efficacy.

Via

Access Paper or Ask Questions

Deep Position-wise Interaction Network for CTR Prediction

Jun 17, 2021

Jianqiang Huang, Ke Hu, Qingtao Tang, Mingjian Chen, Yi Qi, Jia Cheng, Jun Lei

Figure 1 for Deep Position-wise Interaction Network for CTR Prediction

Figure 2 for Deep Position-wise Interaction Network for CTR Prediction

Figure 3 for Deep Position-wise Interaction Network for CTR Prediction

Abstract:Click-through rate (CTR) prediction plays an important role in online advertising and recommender systems. In practice, the training of CTR models depends on click data which is intrinsically biased towards higher positions since higher position has higher CTR by nature. Existing methods such as actual position training with fixed position inference and inverse propensity weighted training with no position inference alleviate the bias problem to some extend. However, the different treatment of position information between training and inference will inevitably lead to inconsistency and sub-optimal online performance. Meanwhile, the basic assumption of these methods, i.e., the click probability is the product of examination probability and relevance probability, is oversimplified and insufficient to model the rich interaction between position and other information. In this paper, we propose a Deep Position-wise Interaction Network (DPIN) to efficiently combine all candidate items and positions for estimating CTR at each position, achieving consistency between offline and online as well as modeling the deep non-linear interaction among position, user, context and item under the limit of serving performance. Following our new treatment to the position bias in CTR prediction, we propose a new evaluation metrics named PAUC (position-wise AUC) that is suitable for measuring the ranking quality at a given position. Through extensive experiments on a real world dataset, we show empirically that our method is both effective and efficient in solving position bias problem. We have also deployed our method in production and observed statistically significant improvement over a highly optimized baseline in a rigorous A/B test.

* Accepted by SIGIR 2021

Via

Access Paper or Ask Questions

A Deep Spatio-Temporal Fuzzy Neural Network for Passenger Demand Prediction

May 13, 2019

Xiaoyuan Liang, Guiling Wang, Martin Renqiang Min, Yi Qi, Zhu Han

Figure 1 for A Deep Spatio-Temporal Fuzzy Neural Network for Passenger Demand Prediction

Figure 2 for A Deep Spatio-Temporal Fuzzy Neural Network for Passenger Demand Prediction

Figure 3 for A Deep Spatio-Temporal Fuzzy Neural Network for Passenger Demand Prediction

Figure 4 for A Deep Spatio-Temporal Fuzzy Neural Network for Passenger Demand Prediction

Abstract:In spite of its importance, passenger demand prediction is a highly challenging problem, because the demand is simultaneously influenced by the complex interactions among many spatial and temporal factors and other external factors such as weather. To address this problem, we propose a Spatio-TEmporal Fuzzy neural Network (STEF-Net) to accurately predict passenger demands incorporating the complex interactions of all known important factors. We design an end-to-end learning framework with different neural networks modeling different factors. Specifically, we propose to capture spatio-temporal feature interactions via a convolutional long short-term memory network and model external factors via a fuzzy neural network that handles data uncertainty significantly better than deterministic methods. To keep the temporal relations when fusing two networks and emphasize discriminative spatio-temporal feature interactions, we employ a novel feature fusion method with a convolution operation and an attention layer. As far as we know, our work is the first to fuse a deep recurrent neural network and a fuzzy neural network to model complex spatial-temporal feature interactions with additional uncertain input features for predictive learning. Experiments on a large-scale real-world dataset show that our model achieves more than 10% improvement over the state-of-the-art approaches.

* Proceedings of the 2019 SIAM International Conference on Data Mining
* https://epubs.siam.org/doi/abs/10.1137/1.9781611975673.12

Via

Access Paper or Ask Questions