Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaoya Lu

Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

Oct 14, 2024

Qibing Ren, Hao Li, Dongrui Liu, Zhanxu Xie, Xiaoya Lu, Yu Qiao, Lei Sha, Junchi Yan, Lizhuang Ma, Jing Shao

Figure 1 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

Figure 2 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

Figure 3 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

Figure 4 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

Abstract:This study exposes the safety vulnerabilities of Large Language Models (LLMs) in multi-turn interactions, where malicious users can obscure harmful intents across several queries. We introduce ActorAttack, a novel multi-turn attack method inspired by actor-network theory, which models a network of semantically linked actors as attack clues to generate diverse and effective attack paths toward harmful targets. ActorAttack addresses two main challenges in multi-turn attacks: (1) concealing harmful intents by creating an innocuous conversation topic about the actor, and (2) uncovering diverse attack paths towards the same harmful target by leveraging LLMs' knowledge to specify the correlated actors as various attack clues. In this way, ActorAttack outperforms existing single-turn and multi-turn attack methods across advanced aligned LLMs, even for GPT-o1. We will publish a dataset called SafeMTData, which includes multi-turn adversarial prompts and safety alignment data, generated by ActorAttack. We demonstrate that models safety-tuned using our safety dataset are more robust to multi-turn attacks. Code is available at https://github.com/renqibing/ActorAttack.

Via

Access Paper or Ask Questions

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

Mar 28, 2024

Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao(+2 more)

Figure 1 for RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

Figure 2 for RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

Figure 3 for RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

Figure 4 for RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

Abstract:The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments. Recent progress in utilizing language models as high-level planners has demonstrated that the complexity of tasks can be reduced through decomposing them into primitive-level plans, making it possible to generalize on novel robotic tasks in a composable manner. Despite the promising future, the community is not yet adequately prepared for composable generalization agents, particularly due to the lack of primitive-level real-world robotic datasets. In this paper, we propose a primitive-level robotic dataset, namely RH20T-P, which contains about 33000 video clips covering 44 diverse and complicated robotic tasks. Each clip is manually annotated according to a set of meticulously designed primitive skills, facilitating the future development of composable generalization agents. To validate the effectiveness of RH20T-P, we also construct a potential and scalable agent based on RH20T-P, called RA-P. Equipped with two planners specialized in task decomposition and motion planning, RA-P can adapt to novel physical skills through composable generalization. Our website and videos can be found at https://sites.google.com/view/rh20t-primitive/main. Dataset and code will be made available soon.

* 24 pages, 12 figures, 6 tables

Via

Access Paper or Ask Questions