Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions

Jul 28, 2022

Yanai Elazar, Nora Kassner, Shauli Ravfogel, Amir Feder, Abhilasha Ravichander, Marius Mosbach, Yonatan Belinkov, Hinrich Schütze, Yoav Goldberg

Figure 1 for Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions

Figure 2 for Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions

Figure 3 for Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions

Figure 4 for Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions

Share this with someone who'll enjoy it:

Abstract:Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models. But what exactly in the training data causes a model to make a certain prediction? We seek to answer this question by providing a language for describing how training data influences predictions, through a causal framework. Importantly, our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone. Addressing the problem of extracting factual knowledge from pretrained language models (PLMs), we focus on simple data statistics such as co-occurrence counts and show that these statistics do influence the predictions of PLMs, suggesting that such models rely on shallow heuristics. Our causal framework and our results demonstrate the importance of studying datasets and the benefits of causality for understanding NLP models.

View paper on

Share this with someone who'll enjoy it:

Title:Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions

Paper and Code