Picture for Justin Wang

Justin Wang

GRASP: Municipal Budget AI Chatbots for Enhancing Civic Engagement

Add code
Mar 30, 2025
Viaarxiv icon

Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity

Add code
Feb 17, 2025
Viaarxiv icon

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Add code
Oct 11, 2024
Figure 1 for AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Figure 2 for AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Figure 3 for AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Figure 4 for AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Viaarxiv icon

$\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization

Add code
Oct 07, 2024
Viaarxiv icon

MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety

Add code
Sep 20, 2024
Viaarxiv icon

Tamper-Resistant Safeguards for Open-Weight LLMs

Add code
Aug 01, 2024
Figure 1 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 2 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 3 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 4 for Tamper-Resistant Safeguards for Open-Weight LLMs
Viaarxiv icon

Improving Alignment and Robustness with Circuit Breakers

Add code
Jun 10, 2024
Figure 1 for Improving Alignment and Robustness with Circuit Breakers
Figure 2 for Improving Alignment and Robustness with Circuit Breakers
Figure 3 for Improving Alignment and Robustness with Circuit Breakers
Figure 4 for Improving Alignment and Robustness with Circuit Breakers
Viaarxiv icon

Improving Alignment and Robustness with Short Circuiting

Add code
Jun 06, 2024
Figure 1 for Improving Alignment and Robustness with Short Circuiting
Figure 2 for Improving Alignment and Robustness with Short Circuiting
Figure 3 for Improving Alignment and Robustness with Short Circuiting
Figure 4 for Improving Alignment and Robustness with Short Circuiting
Viaarxiv icon

From Symbolic Tasks to Code Generation: Diversification Yields Better Task Performers

Add code
May 31, 2024
Viaarxiv icon

Instruction Diversity Drives Generalization To Unseen Tasks

Add code
Feb 16, 2024
Viaarxiv icon