Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Terry Tong

Unraveling Indirect In-Context Learning Using Influence Functions

Jan 01, 2025

Hadi Askari, Shivanshu Gupta, Terry Tong, Fei Wang, Anshuman Chhabra, Muhao Chen

Figure 1 for Unraveling Indirect In-Context Learning Using Influence Functions

Figure 2 for Unraveling Indirect In-Context Learning Using Influence Functions

Figure 3 for Unraveling Indirect In-Context Learning Using Influence Functions

Figure 4 for Unraveling Indirect In-Context Learning Using Influence Functions

Abstract:This work introduces a novel paradigm for generalized In-Context Learning (ICL), termed Indirect In-Context Learning. In Indirect ICL, we explore demonstration selection strategies tailored for two distinct real-world scenarios: Mixture of Tasks and Noisy Demonstrations. We systematically evaluate the effectiveness of Influence Functions (IFs) as a selection tool for these settings, highlighting the potential for IFs to better capture the informativeness of examples within the demonstration pool. For the Mixture of Tasks setting, demonstrations are drawn from 28 diverse tasks, including MMLU, BigBench, StrategyQA, and CommonsenseQA. We demonstrate that combining BertScore-Recall (BSR) with an IF surrogate model can significantly improve performance, leading to average absolute accuracy gains of 0.37\% and 1.45\% for 3-shot and 5-shot setups when compared to traditional ICL metrics. In the Noisy Demonstrations setting, we examine scenarios where demonstrations might be mislabeled. Our experiments show that reweighting traditional ICL selectors (BSR and Cosine Similarity) with IF-based selectors boosts accuracy by an average of 2.90\% for Cosine Similarity and 2.94\% for BSR on noisy GLUE benchmarks. In sum, we propose a robust framework for demonstration selection that generalizes beyond traditional ICL, offering valuable insights into the role of IFs for Indirect ICL.

* Under Review

Via

Access Paper or Ask Questions

Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges

Sep 30, 2024

Qin Liu, Wenjie Mo, Terry Tong, Jiashu Xu, Fei Wang, Chaowei Xiao, Muhao Chen

Figure 1 for Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges

Figure 2 for Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges

Abstract:The advancement of Large Language Models (LLMs) has significantly impacted various domains, including Web search, healthcare, and software development. However, as these models scale, they become more vulnerable to cybersecurity risks, particularly backdoor attacks. By exploiting the potent memorization capacity of LLMs, adversaries can easily inject backdoors into LLMs by manipulating a small portion of training data, leading to malicious behaviors in downstream applications whenever the hidden backdoor is activated by the pre-defined triggers. Moreover, emerging learning paradigms like instruction tuning and reinforcement learning from human feedback (RLHF) exacerbate these risks as they rely heavily on crowdsourced data and human feedback, which are not fully controlled. In this paper, we present a comprehensive survey of emerging backdoor threats to LLMs that appear during LLM development or inference, and cover recent advancement in both defense and detection strategies for mitigating backdoor threats to LLMs. We also outline key challenges in addressing these threats, highlighting areas for future research.

* The 60th Annual Allerton Conference (Invited Paper). The arXiv version is a pre-IEEE Press publication version

Via

Access Paper or Ask Questions

Securing Multi-turn Conversational Language Models Against Distributed Backdoor Triggers

Jul 04, 2024

Terry Tong, Jiashu Xu, Qin Liu, Muhao Chen

Figure 1 for Securing Multi-turn Conversational Language Models Against Distributed Backdoor Triggers

Figure 2 for Securing Multi-turn Conversational Language Models Against Distributed Backdoor Triggers

Figure 3 for Securing Multi-turn Conversational Language Models Against Distributed Backdoor Triggers

Figure 4 for Securing Multi-turn Conversational Language Models Against Distributed Backdoor Triggers

Abstract:The security of multi-turn conversational large language models (LLMs) is understudied despite it being one of the most popular LLM utilization. Specifically, LLMs are vulnerable to data poisoning backdoor attacks, where an adversary manipulates the training data to cause the model to output malicious responses to predefined triggers. Specific to the multi-turn dialogue setting, LLMs are at the risk of even more harmful and stealthy backdoor attacks where the backdoor triggers may span across multiple utterances, giving lee-way to context-driven attacks. In this paper, we explore a novel distributed backdoor trigger attack that serves to be an extra tool in an adversary's toolbox that can interface with other single-turn attack strategies in a plug and play manner. Results on two representative defense mechanisms indicate that distributed backdoor triggers are robust against existing defense strategies which are designed for single-turn user-model interactions, motivating us to propose a new defense strategy for the multi-turn dialogue setting that is more challenging. To this end, we also explore a novel contrastive decoding based defense that is able to mitigate the backdoor with a low computational tradeoff.

* Submitted to EMNLP 2024

Via

Access Paper or Ask Questions