Picture for Zeming Wei

Zeming Wei

MILE: A Mutation Testing Framework of In-Context Learning Systems

Add code
Sep 07, 2024
Viaarxiv icon

Automata Extraction from Transformers

Add code
Jun 08, 2024
Viaarxiv icon

A Theoretical Understanding of Self-Correction through In-context Alignment

Add code
May 28, 2024
Viaarxiv icon

Boosting Jailbreak Attack with Momentum

Add code
May 02, 2024
Viaarxiv icon

Exploring the Robustness of In-Context Learning with Noisy Labels

Add code
May 01, 2024
Viaarxiv icon

Towards General Conceptual Model Editing via Adversarial Representation Engineering

Add code
Apr 21, 2024
Viaarxiv icon

On the Duality Between Sharpness-Aware Minimization and Adversarial Training

Add code
Feb 23, 2024
Figure 1 for On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Figure 2 for On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Figure 3 for On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Figure 4 for On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Viaarxiv icon

Studious Bob Fight Back Against Jailbreaking via Prompt Adversarial Tuning

Add code
Feb 09, 2024
Viaarxiv icon

Jatmo: Prompt Injection Defense by Task-Specific Finetuning

Add code
Jan 08, 2024
Viaarxiv icon

Architecture Matters: Uncovering Implicit Mechanisms in Graph Contrastive Learning

Add code
Nov 05, 2023
Viaarxiv icon