Picture for Aradhana Sinha

Aradhana Sinha

InfAlign: Inference-aware language model alignment

Add code
Dec 27, 2024
Viaarxiv icon

Automated Adversarial Discovery for Safety Classifiers

Add code
Jun 24, 2024
Viaarxiv icon

Generalized People Diversity: Learning a Human Perception-Aligned Diversity Representation for People Images

Add code
Jan 25, 2024
Figure 1 for Generalized People Diversity: Learning a Human Perception-Aligned Diversity Representation for People Images
Figure 2 for Generalized People Diversity: Learning a Human Perception-Aligned Diversity Representation for People Images
Figure 3 for Generalized People Diversity: Learning a Human Perception-Aligned Diversity Representation for People Images
Figure 4 for Generalized People Diversity: Learning a Human Perception-Aligned Diversity Representation for People Images
Viaarxiv icon

Improving Few-shot Generalization of Safety Classifiers via Data Augmented Parameter-Efficient Fine-Tuning

Add code
Oct 25, 2023
Viaarxiv icon

Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks

Add code
Oct 25, 2023
Viaarxiv icon