Picture for Zeming Wei

Zeming Wei

Towards the Worst-case Robustness of Large Language Models

Add code
Jan 31, 2025
Figure 1 for Towards the Worst-case Robustness of Large Language Models
Figure 2 for Towards the Worst-case Robustness of Large Language Models
Figure 3 for Towards the Worst-case Robustness of Large Language Models
Figure 4 for Towards the Worst-case Robustness of Large Language Models
Viaarxiv icon

MILE: A Mutation Testing Framework of In-Context Learning Systems

Add code
Sep 07, 2024
Viaarxiv icon

Automata Extraction from Transformers

Add code
Jun 08, 2024
Viaarxiv icon

A Theoretical Understanding of Self-Correction through In-context Alignment

Add code
May 28, 2024
Viaarxiv icon

Boosting Jailbreak Attack with Momentum

Add code
May 02, 2024
Viaarxiv icon

Exploring the Robustness of In-Context Learning with Noisy Labels

Add code
May 01, 2024
Figure 1 for Exploring the Robustness of In-Context Learning with Noisy Labels
Figure 2 for Exploring the Robustness of In-Context Learning with Noisy Labels
Figure 3 for Exploring the Robustness of In-Context Learning with Noisy Labels
Figure 4 for Exploring the Robustness of In-Context Learning with Noisy Labels
Viaarxiv icon

Towards General Conceptual Model Editing via Adversarial Representation Engineering

Add code
Apr 21, 2024
Viaarxiv icon

On the Duality Between Sharpness-Aware Minimization and Adversarial Training

Add code
Feb 23, 2024
Figure 1 for On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Figure 2 for On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Figure 3 for On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Figure 4 for On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Viaarxiv icon

Studious Bob Fight Back Against Jailbreaking via Prompt Adversarial Tuning

Add code
Feb 09, 2024
Viaarxiv icon

Jatmo: Prompt Injection Defense by Task-Specific Finetuning

Add code
Jan 08, 2024
Viaarxiv icon