Picture for Sahar Abdelnabi

Sahar Abdelnabi

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

Add code
Oct 30, 2025
Viaarxiv icon

Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies

Add code
Oct 16, 2025
Viaarxiv icon

LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge

Add code
Jun 11, 2025
Viaarxiv icon

Linear Control of Test Awareness Reveals Differential Compliance in Reasoning Models

Add code
May 20, 2025
Viaarxiv icon

Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models

Add code
Feb 27, 2025
Viaarxiv icon

Safety is Essential for Responsible Open-Ended Systems

Add code
Feb 06, 2025
Viaarxiv icon

Hypothesizing Missing Causal Variables with LLMs

Add code
Sep 04, 2024
Viaarxiv icon

Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition

Add code
Jun 12, 2024
Figure 1 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Figure 2 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Figure 3 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Figure 4 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Viaarxiv icon

Are you still on track!? Catching LLM Task Drift with Activations

Add code
Jun 02, 2024
Figure 1 for Are you still on track!? Catching LLM Task Drift with Activations
Figure 2 for Are you still on track!? Catching LLM Task Drift with Activations
Figure 3 for Are you still on track!? Catching LLM Task Drift with Activations
Figure 4 for Are you still on track!? Catching LLM Task Drift with Activations
Viaarxiv icon

Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Add code
Mar 11, 2024
Figure 1 for Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Figure 2 for Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Figure 3 for Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Figure 4 for Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Viaarxiv icon