Picture for Xinyue Shen

Xinyue Shen

HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns

Add code
Jan 28, 2025
Viaarxiv icon

Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media

Add code
Dec 24, 2024
Viaarxiv icon

Voice Jailbreak Attacks Against GPT-4o

Add code
May 29, 2024
Viaarxiv icon

UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

Add code
May 06, 2024
Viaarxiv icon

Comprehensive Assessment of Jailbreak Attacks Against LLMs

Add code
Feb 08, 2024
Viaarxiv icon

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

Add code
Aug 07, 2023
Viaarxiv icon

Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models

Add code
May 23, 2023
Viaarxiv icon

In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT

Add code
Apr 18, 2023
Viaarxiv icon

MGTBench: Benchmarking Machine-Generated Text Detection

Add code
Mar 26, 2023
Viaarxiv icon

Prompt Stealing Attacks Against Text-to-Image Generation Models

Add code
Feb 20, 2023
Viaarxiv icon