Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anirudh Rathore

Evaluating and Characterizing Human Rationales

Oct 09, 2020

Samuel Carton, Anirudh Rathore, Chenhao Tan

Figure 1 for Evaluating and Characterizing Human Rationales

Figure 2 for Evaluating and Characterizing Human Rationales

Figure 3 for Evaluating and Characterizing Human Rationales

Figure 4 for Evaluating and Characterizing Human Rationales

Abstract:Two main approaches for evaluating the quality of machine-generated rationales are: 1) using human rationales as a gold standard; and 2) automated metrics based on how rationales affect model behavior. An open question, however, is how human rationales fare with these automatic metrics. Analyzing a variety of datasets and models, we find that human rationales do not necessarily perform well on these metrics. To unpack this finding, we propose improved metrics to account for model-dependent baseline performance. We then propose two methods to further characterize rationale quality, one based on model retraining and one on using "fidelity curves" to reveal properties such as irrelevance and redundancy. Our work leads to actionable suggestions for evaluating and characterizing rationales.

* 14 pages, 15 figures, to appear in EMNLP 2020. Code is available at https://github.com/BoulderDS/evaluating-human-rationales

Via

Access Paper or Ask Questions