Picture for Sahil Agarwal

Sahil Agarwal

Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs

Add code
Oct 13, 2024
Figure 1 for Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs
Figure 2 for Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs
Viaarxiv icon

SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming

Add code
Aug 14, 2024
Figure 1 for SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
Figure 2 for SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
Figure 3 for SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
Figure 4 for SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
Viaarxiv icon

Increased LLM Vulnerabilities from Fine-tuning and Quantization

Add code
Apr 05, 2024
Viaarxiv icon