Picture for Michael Backes

Michael Backes

Voice Jailbreak Attacks Against GPT-4o

Add code
May 29, 2024
Viaarxiv icon

Link Stealing Attacks Against Inductive Graph Neural Networks

Add code
May 09, 2024
Viaarxiv icon

UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

Add code
May 06, 2024
Figure 1 for UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Figure 2 for UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Figure 3 for UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Figure 4 for UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Viaarxiv icon

Rapid Adoption, Hidden Risks: The Dual Impact of Large Language Model Customization

Add code
Feb 15, 2024
Viaarxiv icon

Comprehensive Assessment of Jailbreak Attacks Against LLMs

Add code
Feb 08, 2024
Viaarxiv icon

Conversation Reconstruction Attack Against GPT Models

Feb 05, 2024
Figure 1 for Conversation Reconstruction Attack Against GPT Models
Figure 2 for Conversation Reconstruction Attack Against GPT Models
Figure 3 for Conversation Reconstruction Attack Against GPT Models
Figure 4 for Conversation Reconstruction Attack Against GPT Models
Viaarxiv icon

TrustLLM: Trustworthiness in Large Language Models

Add code
Jan 25, 2024
Figure 1 for TrustLLM: Trustworthiness in Large Language Models
Figure 2 for TrustLLM: Trustworthiness in Large Language Models
Figure 3 for TrustLLM: Trustworthiness in Large Language Models
Figure 4 for TrustLLM: Trustworthiness in Large Language Models
Viaarxiv icon

Memorization in Self-Supervised Learning Improves Downstream Generalization

Add code
Jan 24, 2024
Viaarxiv icon

FAKEPCD: Fake Point Cloud Detection via Source Attribution

Add code
Dec 18, 2023
Viaarxiv icon

Generated Distributions Are All You Need for Membership Inference Attacks Against Generative Models

Add code
Oct 30, 2023
Viaarxiv icon