Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huijie Lv

Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric

Feb 25, 2025

Yuming Yang, Yang Nan, Junjie Ye, Shihan Dou, Xiao Wang, Shuo Li, Huijie Lv, Tao Gui, Qi Zhang, Xuanjing Huang

Figure 1 for Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric

Figure 2 for Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric

Figure 3 for Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric

Figure 4 for Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric

Abstract:Data diversity is crucial for the instruction tuning of large language models. Existing studies have explored various diversity-aware data selection methods to construct high-quality datasets and enhance model performance. However, the fundamental problem of precisely defining and measuring data diversity remains underexplored, limiting clear guidance for data engineering. To address this, we systematically analyze 11 existing diversity measurement methods by evaluating their correlation with model performance through extensive fine-tuning experiments. Our results indicate that a reliable diversity measure should properly account for both inter-sample differences and the information distribution in the sample space. Building on this, we propose NovelSum, a new diversity metric based on sample-level "novelty." Experiments on both simulated and real-world data show that NovelSum accurately captures diversity variations and achieves a 0.97 correlation with instruction-tuned model performance, highlighting its value in guiding data engineering practices. With NovelSum as an optimization objective, we further develop a greedy, diversity-oriented data selection strategy that outperforms existing approaches, validating both the effectiveness and practical significance of our metric.

* 15 pages. The related codes and resources will be released later. Project page: https://github.com/UmeanNever/NovelSum

Via

Access Paper or Ask Questions

SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

Jun 26, 2024

Caishuang Huang, Wanxu Zhao, Rui Zheng, Huijie Lv, Shihan Dou, Sixian Li, Xiao Wang, Enyu Zhou, Junjie Ye, Yuming Yang(+3 more)

Figure 1 for SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

Figure 2 for SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

Figure 3 for SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

Figure 4 for SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

Abstract:As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research. However, current defense strategies against jailbreak attacks (i.e., efforts to bypass security protocols) often suffer from limited adaptability, restricted general capability, and high cost. To address these challenges, we introduce SafeAligner, a methodology implemented at the decoding stage to fortify defenses against jailbreak attacks. We begin by developing two specialized models: the Sentinel Model, which is trained to foster safety, and the Intruder Model, designed to generate riskier responses. SafeAligner leverages the disparity in security levels between the responses from these models to differentiate between harmful and beneficial tokens, effectively guiding the safety alignment by altering the output token distribution of the target model. Extensive experiments show that SafeAligner can increase the likelihood of beneficial tokens, while reducing the occurrence of harmful ones, thereby ensuring secure alignment with minimal loss to generality.

Via

Access Paper or Ask Questions

CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

Feb 26, 2024

Huijie Lv, Xiao Wang, Yuansen Zhang, Caishuang Huang, Shihan Dou, Junjie Ye, Tao Gui, Qi Zhang, Xuanjing Huang

Figure 1 for CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

Figure 2 for CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

Figure 3 for CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

Figure 4 for CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

Abstract:Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs). This paper delves into the mechanisms behind such successful attacks, introducing a hypothesis for the safety mechanism of aligned LLMs: intent security recognition followed by response generation. Grounded in this hypothesis, we propose CodeChameleon, a novel jailbreak framework based on personalized encryption tactics. To elude the intent security recognition phase, we reformulate tasks into a code completion format, enabling users to encrypt queries using personalized encryption functions. To guarantee response generation functionality, we embed a decryption function within the instructions, which allows the LLM to decrypt and execute the encrypted queries successfully. We conduct extensive experiments on 7 LLMs, achieving state-of-the-art average Attack Success Rate (ASR). Remarkably, our method achieves an 86.6\% ASR on GPT-4-1106.

Via

Access Paper or Ask Questions