Picture for Shiyao Cui

Shiyao Cui

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

Add code
Feb 24, 2025
Viaarxiv icon

LongSafety: Evaluating Long-Context Safety of Large Language Models

Add code
Feb 24, 2025
Viaarxiv icon

Human Decision-making is Susceptible to AI-driven Manipulation

Add code
Feb 11, 2025
Viaarxiv icon

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Add code
Dec 19, 2024
Viaarxiv icon

The Superalignment of Superhuman Intelligence with Large Language Models

Add code
Dec 15, 2024
Figure 1 for The Superalignment of Superhuman Intelligence with Large Language Models
Viaarxiv icon

Global Challenge for Safe and Secure LLMs Track 1

Add code
Nov 21, 2024
Figure 1 for Global Challenge for Safe and Secure LLMs Track 1
Figure 2 for Global Challenge for Safe and Secure LLMs Track 1
Figure 3 for Global Challenge for Safe and Secure LLMs Track 1
Figure 4 for Global Challenge for Safe and Secure LLMs Track 1
Viaarxiv icon

NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time

Add code
Aug 07, 2024
Viaarxiv icon

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Add code
Jul 03, 2024
Figure 1 for Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Figure 2 for Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Figure 3 for Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Figure 4 for Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Viaarxiv icon

Adaptive Data Augmentation for Aspect Sentiment Quad Prediction

Add code
Jan 12, 2024
Viaarxiv icon

FFT: Towards Harmlessness Evaluation and Analysis for LLMs with Factuality, Fairness, Toxicity

Add code
Nov 30, 2023
Viaarxiv icon