Picture for Lei Sha

Lei Sha

Plug-and-Play Training Framework for Preference Optimization

Add code
Dec 30, 2024
Viaarxiv icon

DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak

Add code
Dec 23, 2024
Figure 1 for DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Figure 2 for DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Figure 3 for DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Figure 4 for DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Viaarxiv icon

Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

Add code
Oct 14, 2024
Figure 1 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
Figure 2 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
Figure 3 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
Figure 4 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
Viaarxiv icon

BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models

Add code
Oct 13, 2024
Figure 1 for BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Figure 2 for BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Figure 3 for BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Figure 4 for BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Viaarxiv icon

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

Add code
Oct 10, 2024
Figure 1 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Figure 2 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Figure 3 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Figure 4 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Viaarxiv icon

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Add code
Sep 04, 2024
Figure 1 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Figure 2 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Figure 3 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Figure 4 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Viaarxiv icon

HSF: Defending against Jailbreak Attacks with Hidden State Filtering

Add code
Aug 31, 2024
Figure 1 for HSF: Defending against Jailbreak Attacks with Hidden State Filtering
Figure 2 for HSF: Defending against Jailbreak Attacks with Hidden State Filtering
Figure 3 for HSF: Defending against Jailbreak Attacks with Hidden State Filtering
Figure 4 for HSF: Defending against Jailbreak Attacks with Hidden State Filtering
Viaarxiv icon

ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator

Add code
May 28, 2024
Viaarxiv icon

Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation

Add code
Apr 19, 2024
Figure 1 for Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation
Figure 2 for Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation
Figure 3 for Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation
Figure 4 for Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation
Viaarxiv icon

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

Add code
Feb 26, 2024
Viaarxiv icon