Picture for Kui Ren

Kui Ren

School of Cyber Science and Technology, Zhejiang University

Towards LLM Guardrails via Sparse Representation Steering

Add code
Mar 21, 2025
Viaarxiv icon

Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models

Add code
Mar 17, 2025
Viaarxiv icon

Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models

Add code
Mar 12, 2025
Viaarxiv icon

Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation

Add code
Mar 09, 2025
Viaarxiv icon

Towards Collaborative Anti-Money Laundering Among Financial Institutions

Add code
Feb 27, 2025
Viaarxiv icon

Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models

Add code
Feb 26, 2025
Viaarxiv icon

REFINE: Inversion-Free Backdoor Defense via Model Reprogramming

Add code
Feb 22, 2025
Viaarxiv icon

CoKV: Optimizing KV Cache Allocation via Cooperative Game

Add code
Feb 21, 2025
Viaarxiv icon

Robust Watermarks Leak: Channel-Aware Feature Extraction Enables Adversarial Watermark Manipulation

Add code
Feb 10, 2025
Figure 1 for Robust Watermarks Leak: Channel-Aware Feature Extraction Enables Adversarial Watermark Manipulation
Figure 2 for Robust Watermarks Leak: Channel-Aware Feature Extraction Enables Adversarial Watermark Manipulation
Figure 3 for Robust Watermarks Leak: Channel-Aware Feature Extraction Enables Adversarial Watermark Manipulation
Figure 4 for Robust Watermarks Leak: Channel-Aware Feature Extraction Enables Adversarial Watermark Manipulation
Viaarxiv icon

Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense

Add code
Feb 02, 2025
Figure 1 for Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense
Figure 2 for Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense
Figure 3 for Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense
Figure 4 for Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense
Viaarxiv icon