Picture for Xuandong Zhao

Xuandong Zhao

Reward Shaping to Mitigate Reward Hacking in RLHF

Add code
Feb 26, 2025
Viaarxiv icon

Scalable Best-of-N Selection for Large Language Models via Self-Certainty

Add code
Feb 25, 2025
Viaarxiv icon

DIS-CO: Discovering Copyrighted Content in VLMs Training Data

Add code
Feb 25, 2025
Viaarxiv icon

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

Add code
Feb 18, 2025
Viaarxiv icon

PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage

Add code
Dec 07, 2024
Viaarxiv icon

A Practical Examination of AI-Generated Text Detectors for Large Language Models

Add code
Dec 06, 2024
Figure 1 for A Practical Examination of AI-Generated Text Detectors for Large Language Models
Figure 2 for A Practical Examination of AI-Generated Text Detectors for Large Language Models
Figure 3 for A Practical Examination of AI-Generated Text Detectors for Large Language Models
Figure 4 for A Practical Examination of AI-Generated Text Detectors for Large Language Models
Viaarxiv icon

SoK: Watermarking for AI-Generated Content

Add code
Nov 27, 2024
Viaarxiv icon

An undetectable watermark for generative image models

Add code
Oct 09, 2024
Figure 1 for An undetectable watermark for generative image models
Figure 2 for An undetectable watermark for generative image models
Figure 3 for An undetectable watermark for generative image models
Figure 4 for An undetectable watermark for generative image models
Viaarxiv icon

Multimodal Situational Safety

Add code
Oct 08, 2024
Figure 1 for Multimodal Situational Safety
Figure 2 for Multimodal Situational Safety
Figure 3 for Multimodal Situational Safety
Figure 4 for Multimodal Situational Safety
Viaarxiv icon

Efficiently Identifying Watermarked Segments in Mixed-Source Texts

Add code
Oct 04, 2024
Viaarxiv icon