Picture for Dawn Song

Dawn Song

University of California, Berkeley

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Add code
Mar 19, 2025
Viaarxiv icon

Improving LLM Safety Alignment with Dual-Objective Optimization

Add code
Mar 05, 2025
Viaarxiv icon

Scalable Best-of-N Selection for Large Language Models via Self-Certainty

Add code
Feb 25, 2025
Viaarxiv icon

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

Add code
Feb 18, 2025
Viaarxiv icon

International AI Safety Report

Add code
Jan 29, 2025
Viaarxiv icon

Can LLMs Design Good Questions Based on Context?

Add code
Jan 07, 2025
Viaarxiv icon

Formal Mathematical Reasoning: A New Frontier in AI

Add code
Dec 20, 2024
Viaarxiv icon

Capturing the Temporal Dependence of Training Data Influence

Add code
Dec 12, 2024
Figure 1 for Capturing the Temporal Dependence of Training Data Influence
Figure 2 for Capturing the Temporal Dependence of Training Data Influence
Figure 3 for Capturing the Temporal Dependence of Training Data Influence
Figure 4 for Capturing the Temporal Dependence of Training Data Influence
Viaarxiv icon

Data Free Backdoor Attacks

Add code
Dec 09, 2024
Viaarxiv icon

Boosting Alignment for Post-Unlearning Text-to-Image Generative Models

Add code
Dec 09, 2024
Figure 1 for Boosting Alignment for Post-Unlearning Text-to-Image Generative Models
Figure 2 for Boosting Alignment for Post-Unlearning Text-to-Image Generative Models
Figure 3 for Boosting Alignment for Post-Unlearning Text-to-Image Generative Models
Figure 4 for Boosting Alignment for Post-Unlearning Text-to-Image Generative Models
Viaarxiv icon