Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?

Oct 21, 2024

Urja Khurana, Eric Nalisnick, Antske Fokkens

Figure 1 for DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?

Figure 2 for DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?

Figure 3 for DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?

Figure 4 for DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?

Share this with someone who'll enjoy it:

Abstract:When building a predictive model, it is often difficult to ensure that domain-specific requirements are encoded by the model that will eventually be deployed. Consider researchers working on hate speech detection. They will have an idea of what is considered hate speech, but building a model that reflects their view accurately requires preserving those ideals throughout the workflow of data set construction and model training. Complications such as sampling bias, annotation bias, and model misspecification almost always arise, possibly resulting in a gap between the domain specification and the model's actual behavior upon deployment. To address this issue for hate speech detection, we propose DefVerify: a 3-step procedure that (i) encodes a user-specified definition of hate speech, (ii) quantifies to what extent the model reflects the intended definition, and (iii) tries to identify the point of failure in the workflow. We use DefVerify to find gaps between definition and model behavior when applied to six popular hate speech benchmark datasets.

* Preprint

View paper on

Share this with someone who'll enjoy it:

Title:DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?

Paper and Code