Picture for Andy Zhou

Andy Zhou

RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Add code
Nov 12, 2024
Viaarxiv icon

KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on Graph Data

Add code
Oct 10, 2024
Viaarxiv icon

Tamper-Resistant Safeguards for Open-Weight LLMs

Add code
Aug 01, 2024
Figure 1 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 2 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 3 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 4 for Tamper-Resistant Safeguards for Open-Weight LLMs
Viaarxiv icon

AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies

Add code
Jun 25, 2024
Viaarxiv icon

Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters

Add code
May 30, 2024
Viaarxiv icon

FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning

Add code
Apr 03, 2024
Viaarxiv icon

GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models

Add code
Feb 05, 2024
Viaarxiv icon

Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

Add code
Feb 02, 2024
Viaarxiv icon

Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models

Add code
Nov 02, 2023
Viaarxiv icon

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Add code
Oct 06, 2023
Viaarxiv icon