Picture for Dmitriy Bespalov

Dmitriy Bespalov

TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice

Add code
Feb 21, 2025
Viaarxiv icon

Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation

Add code
Jan 28, 2025
Viaarxiv icon

TaeBench: Improving Quality of Toxic Adversarial Examples

Add code
Oct 08, 2024
Viaarxiv icon

Towards Building a Robust Toxicity Predictor

Add code
Apr 09, 2024
Viaarxiv icon

Latent Skill Discovery for Chain-of-Thought Reasoning

Add code
Dec 07, 2023
Viaarxiv icon