Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hsuan Su

Jailbreaking with Universal Multi-Prompts

Feb 03, 2025

Yu-Ling Hsu, Hsuan Su, Shang-Tse Chen

Abstract:Large language models (LLMs) have seen rapid development in recent years, revolutionizing various applications and significantly enhancing convenience and productivity. However, alongside their impressive capabilities, ethical concerns and new types of attacks, such as jailbreaking, have emerged. While most prompting techniques focus on optimizing adversarial inputs for individual cases, resulting in higher computational costs when dealing with large datasets. Less research has addressed the more general setting of training a universal attacker that can transfer to unseen tasks. In this paper, we introduce JUMP, a prompt-based method designed to jailbreak LLMs using universal multi-prompts. We also adapt our approach for defense, which we term DUMP. Experimental results demonstrate that our method for optimizing universal multi-prompts outperforms existing techniques.

* Accepted by NAACL Findings 2025

Via

Access Paper or Ask Questions

Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging

Dec 27, 2024

Hua Farn, Hsuan Su, Shachi H Kumar, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee

Figure 1 for Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging

Figure 2 for Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging

Figure 3 for Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging

Figure 4 for Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging

Abstract:Fine-tuning large language models (LLMs) for downstream tasks is a widely adopted approach, but it often leads to safety degradation in safety-aligned LLMs. Currently, many solutions address this issue by incorporating additional safety data, which can be impractical in many cases. In this paper, we address the question: How can we improve downstream task performance while preserving safety in LLMs without relying on additional safety data? We propose a simple and effective method that maintains the inherent safety of LLMs while enhancing their downstream task performance: merging the weights of pre- and post-fine-tuned safety-aligned models. Experimental results across various downstream tasks, models, and merging methods demonstrate that this approach effectively mitigates safety degradation while improving downstream task performance, offering a practical solution for adapting safety-aligned LLMs.

Via

Access Paper or Ask Questions

Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models

Aug 07, 2024

Shachi H Kumar, Saurav Sahay, Sahisnu Mazumder, Eda Okur, Ramesh Manuvinakurike, Nicole Beckage, Hsuan Su, Hung-yi Lee, Lama Nachman

Figure 1 for Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models

Figure 2 for Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models

Figure 3 for Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models

Figure 4 for Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models

Abstract:Large Language Models (LLMs) have excelled at language understanding and generating human-level text. However, even with supervised training and human alignment, these LLMs are susceptible to adversarial attacks where malicious users can prompt the model to generate undesirable text. LLMs also inherently encode potential biases that can cause various harmful effects during interactions. Bias evaluation metrics lack standards as well as consensus and existing methods often rely on human-generated templates and annotations which are expensive and labor intensive. In this work, we train models to automatically create adversarial prompts to elicit biased responses from target LLMs. We present LLM- based bias evaluation metrics and also analyze several existing automatic evaluation methods and metrics. We analyze the various nuances of model responses, identify the strengths and weaknesses of model families, and assess where evaluation methods fall short. We compare these metrics to human evaluation and validate that the LLM-as-a-Judge metric aligns with human judgement on bias in response generation.

* 6 pages paper content, 17 pages of appendix

Via

Access Paper or Ask Questions

SYN2REAL: Leveraging Task Arithmetic for Mitigating Synthetic-Real Discrepancies in ASR Domain Adaptation

Jun 05, 2024

Hsuan Su, Hua Farn, Shang-Tse Chen, Hung-yi Lee

Figure 1 for SYN2REAL: Leveraging Task Arithmetic for Mitigating Synthetic-Real Discrepancies in ASR Domain Adaptation

Figure 2 for SYN2REAL: Leveraging Task Arithmetic for Mitigating Synthetic-Real Discrepancies in ASR Domain Adaptation

Figure 3 for SYN2REAL: Leveraging Task Arithmetic for Mitigating Synthetic-Real Discrepancies in ASR Domain Adaptation

Figure 4 for SYN2REAL: Leveraging Task Arithmetic for Mitigating Synthetic-Real Discrepancies in ASR Domain Adaptation

Abstract:Recent advancements in large language models (LLMs) have introduced the 'task vector' concept, which has significantly impacted various domains but remains underexplored in speech recognition. This paper presents a novel 'SYN2REAL' task vector for domain adaptation in automatic speech recognition (ASR), specifically targeting text-only domains. Traditional fine-tuning on synthetic speech often results in performance degradation due to acoustic mismatches. To address this issue, we propose creating a 'SYN2REAL' vector by subtracting the parameter differences between models fine-tuned on real and synthetic speech. This vector effectively bridges the gap between the two domains. Experiments on the SLURP dataset demonstrate that our approach yields an average improvement of 11.15% in word error rate for unseen target domains, highlighting the potential of task vectors in enhancing speech domain adaptation.

Via

Access Paper or Ask Questions

Step by Step to Fairness: Attributing Societal Bias in Task-oriented Dialogue Systems

Nov 14, 2023

Hsuan Su, Rebecca Qian, Chinnadhurai Sankar, Shahin Shayandeh, Shang-Tse Chen, Hung-yi Lee, Daniel M. Bikel

Figure 1 for Step by Step to Fairness: Attributing Societal Bias in Task-oriented Dialogue Systems

Figure 2 for Step by Step to Fairness: Attributing Societal Bias in Task-oriented Dialogue Systems

Figure 3 for Step by Step to Fairness: Attributing Societal Bias in Task-oriented Dialogue Systems

Figure 4 for Step by Step to Fairness: Attributing Societal Bias in Task-oriented Dialogue Systems

Abstract:Recent works have shown considerable improvements in task-oriented dialogue (TOD) systems by utilizing pretrained large language models (LLMs) in an end-to-end manner. However, the biased behavior of each component in a TOD system and the error propagation issue in the end-to-end framework can lead to seriously biased TOD responses. Existing works of fairness only focus on the total bias of a system. In this paper, we propose a diagnosis method to attribute bias to each component of a TOD system. With the proposed attribution method, we can gain a deeper understanding of the sources of bias. Additionally, researchers can mitigate biased model behavior at a more granular level. We conduct experiments to attribute the TOD system's bias toward three demographic axes: gender, age, and race. Experimental results show that the bias of a TOD system usually comes from the response generation model.

Via

Access Paper or Ask Questions

Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models

Oct 17, 2023

Hsuan Su, Cheng-Chu Cheng, Hua Farn, Shachi H Kumar, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee

Figure 1 for Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models

Figure 2 for Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models

Figure 3 for Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models

Figure 4 for Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models

Abstract:Recently, researchers have made considerable improvements in dialogue systems with the progress of large language models (LLMs) such as ChatGPT and GPT-4. These LLM-based chatbots encode the potential biases while retaining disparities that can harm humans during interactions. The traditional biases investigation methods often rely on human-written test cases. However, these test cases are usually expensive and limited. In this work, we propose a first-of-its-kind method that automatically generates test cases to detect LLMs' potential gender bias. We apply our method to three well-known LLMs and find that the generated test cases effectively identify the presence of biases. To address the biases identified, we propose a mitigation strategy that uses the generated test cases as demonstrations for in-context learning to circumvent the need for parameter fine-tuning. The experimental results show that LLMs generate fairer responses with the proposed approach.

Via

Access Paper or Ask Questions

Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models

Sep 18, 2023

Hsuan Su, Ting-Yao Hu, Hema Swetha Koppula, Raviteja Vemulapalli, Jen-Hao Rick Chang, Karren Yang, Gautam Varma Mantena, Oncel Tuzel

Abstract:While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from those domains. To accomplish this, we propose a novel data synthesis pipeline that uses a Large Language Model (LLM) to generate a target domain text corpus, and a state-of-the-art controllable speech synthesis model to generate the corresponding speech. We propose a simple yet effective in-context instruction finetuning strategy to increase the effectiveness of LLM in generating text corpora for new domains. Experiments on the SLURP dataset show that the proposed method achieves an average relative word error rate improvement of $28\%$ on unseen target domains without any performance drop in source domains.

Via

Access Paper or Ask Questions

Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue

Feb 12, 2023

Hsuan Su, Shachi H Kumar, Sahisnu Mazumder, Wenda Chen, Ramesh Manuvinakurike, Eda Okur, Saurav Sahay, Lama Nachman, Shang-Tse Chen, Hung-yi Lee

Figure 1 for Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue

Figure 2 for Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue

Figure 3 for Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue

Figure 4 for Position Matters! Empirical Study of Order Effect in Knowledge-grounded Dialogue

Abstract:With the power of large pretrained language models, various research works have integrated knowledge into dialogue systems. The traditional techniques treat knowledge as part of the input sequence for the dialogue system, prepending a set of knowledge statements in front of dialogue history. However, such a mechanism forces knowledge sets to be concatenated in an ordered manner, making models implicitly pay imbalanced attention to the sets during training. In this paper, we first investigate how the order of the knowledge set can influence autoregressive dialogue systems' responses. We conduct experiments on two commonly used dialogue datasets with two types of transformer-based models and find that models view the input knowledge unequally. To this end, we propose a simple and novel technique to alleviate the order effect by modifying the position embeddings of knowledge input in these models. With the proposed position embedding method, the experimental results show that each knowledge statement is uniformly considered to generate responses.

Via

Access Paper or Ask Questions

Few-shot Prompting Towards Controllable Response Generation

Jun 09, 2022

Hsuan Su, Pohan Chi, Shih-Cheng Huang, Chung Ho Lam, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee

Figure 1 for Few-shot Prompting Towards Controllable Response Generation

Figure 2 for Few-shot Prompting Towards Controllable Response Generation

Figure 3 for Few-shot Prompting Towards Controllable Response Generation

Figure 4 for Few-shot Prompting Towards Controllable Response Generation

Abstract:Much literature has shown that prompt-based learning is an efficient method to make use of the large pre-trained language model. Recent works also exhibit the possibility of steering a chatbot's output by plugging in an appropriate prompt. Gradient-based methods are often used to perturb the prompts. However, some language models are not even available to the public. In this work, we first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters. Second, to reduce the training effort and enhance the generalizability to the unseen task, we apply multi-task learning to make the model learn to generalize to new tasks better. The experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters. Furthermore, the model demonstrates the strong ability to quickly adapt to an unseen task in fewer steps than the baseline model.

Via

Access Paper or Ask Questions

Controllable Response Generation for Assistive Use-cases

Dec 04, 2021

Shachi H Kumar, Hsuan Su, Ramesh Manuvinakurike, Saurav Sahay, Lama Nachman

Figure 1 for Controllable Response Generation for Assistive Use-cases

Figure 2 for Controllable Response Generation for Assistive Use-cases

Figure 3 for Controllable Response Generation for Assistive Use-cases

Figure 4 for Controllable Response Generation for Assistive Use-cases

Abstract:Conversational agents have become an integral part of the general population for simple task enabling situations. However, these systems are yet to have any social impact on the diverse and minority population, for example, helping people with neurological disorders, for example ALS, and people with speech, language and social communication disorders. Language model technology can play a huge role to help these users carry out daily communication and social interactions. To enable this population, we build a dialog system that can be controlled by users using cues or keywords. We build models that can suggest relevant cues in the dialog response context which is used to control response generation and can speed up communication. We also introduce a keyword loss to lexically constrain the model output. We show both qualitatively and quantitatively that our models can effectively induce the keyword into the model response without degrading the quality of response. In the context of usage of such systems for people with degenerative disorders, we present human evaluation of our cue or keyword predictor and the controllable dialog system and show that our models perform significantly better than models without control. Our study shows that keyword control on end to end response generation models is powerful and can enable and empower users with degenerative disorders to carry out their day to day communication.

Via

Access Paper or Ask Questions