Picture for Maarten Sap

Maarten Sap

Shammie

Minion: A Technology Probe for Resolving Value Conflicts through Expert-Driven and User-Driven Strategies in AI Companion Applications

Add code
Nov 11, 2024
Viaarxiv icon

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

Add code
Oct 22, 2024
Viaarxiv icon

BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data

Add code
Oct 21, 2024
Figure 1 for BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
Figure 2 for BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
Figure 3 for BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
Figure 4 for BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
Viaarxiv icon

Data Defenses Against Large Language Models

Add code
Oct 17, 2024
Figure 1 for Data Defenses Against Large Language Models
Figure 2 for Data Defenses Against Large Language Models
Figure 3 for Data Defenses Against Large Language Models
Figure 4 for Data Defenses Against Large Language Models
Viaarxiv icon

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

Add code
Sep 26, 2024
Viaarxiv icon

AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents

Add code
Sep 13, 2024
Viaarxiv icon

On the Resilience of Multi-Agent Systems with Malicious Agents

Add code
Aug 02, 2024
Viaarxiv icon

Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance

Add code
Jul 10, 2024
Viaarxiv icon

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Add code
Jun 26, 2024
Figure 1 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 2 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 3 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 4 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Viaarxiv icon

HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs

Add code
May 27, 2024
Viaarxiv icon