Picture for Aradhana Sinha

Aradhana Sinha

Automated Adversarial Discovery for Safety Classifiers

Add code
Jun 24, 2024
Viaarxiv icon

Generalized People Diversity: Learning a Human Perception-Aligned Diversity Representation for People Images

Add code
Jan 25, 2024
Viaarxiv icon

Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks

Add code
Oct 25, 2023
Viaarxiv icon

Improving Few-shot Generalization of Safety Classifiers via Data Augmented Parameter-Efficient Fine-Tuning

Add code
Oct 25, 2023
Viaarxiv icon