Picture for Paul Röttger

Paul Röttger

University of Oxford

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

Add code
Feb 12, 2025
Viaarxiv icon

Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations

Add code
Feb 10, 2025
Viaarxiv icon

MSTS: A Multimodal Safety Test Suite for Vision-Language Models

Add code
Jan 17, 2025
Viaarxiv icon

AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages

Add code
Jan 15, 2025
Viaarxiv icon

HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter

Add code
Nov 23, 2024
Figure 1 for HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Figure 2 for HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Figure 3 for HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Figure 4 for HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Viaarxiv icon

Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

Add code
Oct 04, 2024
Viaarxiv icon

Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models

Add code
Aug 08, 2024
Figure 1 for Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models
Figure 2 for Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models
Figure 3 for Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models
Viaarxiv icon

Evidence of a log scaling law for political persuasion with large language models

Add code
Jun 20, 2024
Figure 1 for Evidence of a log scaling law for political persuasion with large language models
Figure 2 for Evidence of a log scaling law for political persuasion with large language models
Figure 3 for Evidence of a log scaling law for political persuasion with large language models
Figure 4 for Evidence of a log scaling law for political persuasion with large language models
Viaarxiv icon

From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets

Add code
Apr 27, 2024
Viaarxiv icon

Near to Mid-term Risks and Opportunities of Open Source Generative AI

Add code
Apr 25, 2024
Figure 1 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Figure 2 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Figure 3 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Figure 4 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Viaarxiv icon