Picture for Vivek Hebbar

Vivek Hebbar

Diffuse AI Control on Fuzzy Tasks

Add code
Jun 08, 2026
Viaarxiv icon

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats

Add code
Nov 26, 2024
Figure 1 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Figure 2 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Figure 3 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Figure 4 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Viaarxiv icon

Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Add code
Jul 22, 2024
Figure 1 for Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
Figure 2 for Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
Figure 3 for Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
Figure 4 for Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
Viaarxiv icon