Picture for Zhichen Dong

Zhichen Dong

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models

Add code
May 29, 2024
Figure 1 for Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Figure 2 for Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Figure 3 for Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Figure 4 for Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Viaarxiv icon

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

Add code
Feb 21, 2024
Figure 1 for Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Figure 2 for Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Figure 3 for Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Figure 4 for Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Viaarxiv icon

Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

Add code
Feb 14, 2024
Viaarxiv icon