Picture for Zidi Xiong

Zidi Xiong

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

Add code
Jun 13, 2024
Viaarxiv icon

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Add code
Mar 19, 2024
Viaarxiv icon

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

Add code
Jan 20, 2024
Viaarxiv icon

CBD: A Certified Backdoor Detector Based on Local Dominant Probability

Add code
Oct 26, 2023
Viaarxiv icon

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Add code
Jun 20, 2023
Viaarxiv icon

UMD: Unsupervised Model Detection for X2X Backdoor Attacks

Add code
Jun 02, 2023
Viaarxiv icon

Label-Smoothed Backdoor Attack

Add code
Feb 19, 2022
Figure 1 for Label-Smoothed Backdoor Attack
Figure 2 for Label-Smoothed Backdoor Attack
Figure 3 for Label-Smoothed Backdoor Attack
Figure 4 for Label-Smoothed Backdoor Attack
Viaarxiv icon