Picture for Lei Sha

Lei Sha

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

Add code
Feb 24, 2025
Viaarxiv icon

Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming

Add code
Feb 22, 2025
Viaarxiv icon

How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation

Add code
Feb 20, 2025
Viaarxiv icon

Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking

Add code
Feb 18, 2025
Viaarxiv icon

Plug-and-Play Training Framework for Preference Optimization

Add code
Dec 30, 2024
Viaarxiv icon

DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak

Add code
Dec 23, 2024
Figure 1 for DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Figure 2 for DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Figure 3 for DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Figure 4 for DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Viaarxiv icon

Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

Add code
Oct 14, 2024
Figure 1 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
Figure 2 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
Figure 3 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
Figure 4 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
Viaarxiv icon

BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models

Add code
Oct 13, 2024
Figure 1 for BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Figure 2 for BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Figure 3 for BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Figure 4 for BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Viaarxiv icon

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

Add code
Oct 10, 2024
Figure 1 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Figure 2 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Figure 3 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Figure 4 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Viaarxiv icon

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Add code
Sep 04, 2024
Figure 1 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Figure 2 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Figure 3 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Figure 4 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Viaarxiv icon