Picture for Zhichen Dong

Zhichen Dong

Emergent Response Planning in LLM

Add code
Feb 10, 2025
Figure 1 for Emergent Response Planning in LLM
Figure 2 for Emergent Response Planning in LLM
Figure 3 for Emergent Response Planning in LLM
Figure 4 for Emergent Response Planning in LLM
Viaarxiv icon

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models

Add code
May 29, 2024
Figure 1 for Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Figure 2 for Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Figure 3 for Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Figure 4 for Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Viaarxiv icon

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

Add code
Feb 21, 2024
Figure 1 for Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Figure 2 for Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Figure 3 for Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Figure 4 for Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Viaarxiv icon

Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

Add code
Feb 14, 2024
Viaarxiv icon