Picture for Jingkun Tang

Jingkun Tang

Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector

Add code
Oct 30, 2024
Viaarxiv icon

Dishonesty in Helpful and Harmless Alignment

Add code
Jun 04, 2024
Figure 1 for Dishonesty in Helpful and Harmless Alignment
Figure 2 for Dishonesty in Helpful and Harmless Alignment
Figure 3 for Dishonesty in Helpful and Harmless Alignment
Figure 4 for Dishonesty in Helpful and Harmless Alignment
Viaarxiv icon