Picture for Xuandong Zhao

Xuandong Zhao

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Add code
Mar 19, 2025
Viaarxiv icon

Improving LLM Safety Alignment with Dual-Objective Optimization

Add code
Mar 05, 2025
Viaarxiv icon

Reward Shaping to Mitigate Reward Hacking in RLHF

Add code
Feb 26, 2025
Viaarxiv icon

Scalable Best-of-N Selection for Large Language Models via Self-Certainty

Add code
Feb 25, 2025
Viaarxiv icon

DIS-CO: Discovering Copyrighted Content in VLMs Training Data

Add code
Feb 25, 2025
Viaarxiv icon

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

Add code
Feb 18, 2025
Viaarxiv icon

PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage

Add code
Dec 07, 2024
Viaarxiv icon

A Practical Examination of AI-Generated Text Detectors for Large Language Models

Add code
Dec 06, 2024
Figure 1 for A Practical Examination of AI-Generated Text Detectors for Large Language Models
Figure 2 for A Practical Examination of AI-Generated Text Detectors for Large Language Models
Figure 3 for A Practical Examination of AI-Generated Text Detectors for Large Language Models
Figure 4 for A Practical Examination of AI-Generated Text Detectors for Large Language Models
Viaarxiv icon

SoK: Watermarking for AI-Generated Content

Add code
Nov 27, 2024
Viaarxiv icon

An undetectable watermark for generative image models

Add code
Oct 09, 2024
Figure 1 for An undetectable watermark for generative image models
Figure 2 for An undetectable watermark for generative image models
Figure 3 for An undetectable watermark for generative image models
Figure 4 for An undetectable watermark for generative image models
Viaarxiv icon