Picture for Justin Wang

Justin Wang

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Add code
Oct 11, 2024
Viaarxiv icon

$\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization

Add code
Oct 07, 2024
Viaarxiv icon

MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety

Add code
Sep 20, 2024
Viaarxiv icon

Tamper-Resistant Safeguards for Open-Weight LLMs

Add code
Aug 01, 2024
Viaarxiv icon

Improving Alignment and Robustness with Circuit Breakers

Add code
Jun 10, 2024
Figure 1 for Improving Alignment and Robustness with Circuit Breakers
Figure 2 for Improving Alignment and Robustness with Circuit Breakers
Figure 3 for Improving Alignment and Robustness with Circuit Breakers
Figure 4 for Improving Alignment and Robustness with Circuit Breakers
Viaarxiv icon

Improving Alignment and Robustness with Short Circuiting

Add code
Jun 06, 2024
Figure 1 for Improving Alignment and Robustness with Short Circuiting
Figure 2 for Improving Alignment and Robustness with Short Circuiting
Figure 3 for Improving Alignment and Robustness with Short Circuiting
Figure 4 for Improving Alignment and Robustness with Short Circuiting
Viaarxiv icon

From Symbolic Tasks to Code Generation: Diversification Yields Better Task Performers

Add code
May 31, 2024
Viaarxiv icon

Instruction Diversity Drives Generalization To Unseen Tasks

Add code
Feb 16, 2024
Viaarxiv icon

3D Pose Detection in Videos: Focusing on Occlusion

Add code
Jun 24, 2020
Figure 1 for 3D Pose Detection in Videos: Focusing on Occlusion
Figure 2 for 3D Pose Detection in Videos: Focusing on Occlusion
Figure 3 for 3D Pose Detection in Videos: Focusing on Occlusion
Figure 4 for 3D Pose Detection in Videos: Focusing on Occlusion
Viaarxiv icon