Picture for Ian Kivlichan

Ian Kivlichan

Tony

Rule Based Rewards for Language Model Safety

Add code
Nov 02, 2024
Viaarxiv icon

GPT-4o System Card

Add code
Oct 25, 2024
Viaarxiv icon

Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities

Add code
Nov 01, 2023
Viaarxiv icon

Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation

Add code
May 01, 2022
Figure 1 for Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation
Figure 2 for Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation
Figure 3 for Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation
Figure 4 for Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation
Viaarxiv icon

Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation

Add code
Dec 08, 2021
Viaarxiv icon