Picture for Katarzyna Kapusta

Katarzyna Kapusta

Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models

Add code
Mar 08, 2025
Viaarxiv icon

DiffGuard: Text-Based Safety Checker for Diffusion Models

Add code
Nov 25, 2024
Viaarxiv icon

When Federated Learning meets Watermarking: A Comprehensive Overview of Techniques for Intellectual Property Protection

Add code
Aug 07, 2023
Viaarxiv icon