Picture for Maarten Sap

Maarten Sap

Shammie

OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety

Add code
Jul 08, 2025
Viaarxiv icon

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Add code
Jul 08, 2025
Viaarxiv icon

Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics

Add code
Jun 14, 2025
Viaarxiv icon

Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication

Add code
May 27, 2025
Viaarxiv icon

Breaking mBad! Supervised Fine-tuning for Cross-Lingual Detoxification

Add code
May 22, 2025
Viaarxiv icon

SOTOPIA-S4: a user-friendly system for flexible, customizable, and large-scale social simulation

Add code
Apr 19, 2025
Viaarxiv icon

Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective

Add code
Apr 15, 2025
Viaarxiv icon

Out of Style: RAG's Fragility to Linguistic Variation

Add code
Apr 11, 2025
Viaarxiv icon

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Add code
Apr 06, 2025
Viaarxiv icon

Un-Straightening Generative AI: How Queer Artists Surface and Challenge the Normativity of Generative AI Models

Add code
Mar 12, 2025
Viaarxiv icon