Picture for Hannah Rose Kirk

Hannah Rose Kirk

Beyond the Binary: Capturing Diverse Preferences With Reward Regularization

Add code
Dec 05, 2024
Viaarxiv icon

LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages

Add code
Jun 11, 2024
Viaarxiv icon

The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub

Add code
May 20, 2024
Viaarxiv icon

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Add code
Apr 24, 2024
Figure 1 for The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Figure 2 for The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Figure 3 for The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Figure 4 for The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Viaarxiv icon

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Add code
Apr 18, 2024
Figure 1 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 2 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 3 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 4 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Viaarxiv icon

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

Add code
Feb 26, 2024
Viaarxiv icon

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

Add code
Nov 14, 2023
Figure 1 for SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
Figure 2 for SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
Figure 3 for SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
Figure 4 for SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
Viaarxiv icon

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

Add code
Oct 11, 2023
Viaarxiv icon

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models

Add code
Oct 03, 2023
Viaarxiv icon

Casteist but Not Racist? Quantifying Disparities in Large Language Model Bias between India and the West

Add code
Sep 15, 2023
Viaarxiv icon