Picture for Niloofar Mireshghallah

Niloofar Mireshghallah

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Add code
Oct 05, 2024
Viaarxiv icon

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

Add code
Sep 26, 2024
Viaarxiv icon

Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild

Add code
Jul 16, 2024
Figure 1 for Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
Figure 2 for Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
Figure 3 for Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
Figure 4 for Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
Viaarxiv icon

CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation

Add code
Jul 09, 2024
Viaarxiv icon

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Add code
Jun 26, 2024
Figure 1 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 2 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 3 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Figure 4 for WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Viaarxiv icon

Breaking News: Case Studies of Generative AI's Use in Journalism

Add code
Jun 19, 2024
Viaarxiv icon

Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs

Add code
Mar 05, 2024
Viaarxiv icon

Do Membership Inference Attacks Work on Large Language Models?

Add code
Feb 12, 2024
Viaarxiv icon

A Roadmap to Pluralistic Alignment

Add code
Feb 07, 2024
Viaarxiv icon

A Block Metropolis-Hastings Sampler for Controllable Energy-based Text Generation

Add code
Dec 07, 2023
Viaarxiv icon