Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bingsheng Chen

Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction

Jun 20, 2023

Haotian Chen, Bingsheng Chen, Xiangdong Zhou

Figure 1 for Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction

Figure 2 for Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction

Figure 3 for Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction

Figure 4 for Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction

Abstract:Document-level relation extraction (DocRE) attracts more research interest recently. While models achieve consistent performance gains in DocRE, their underlying decision rules are still understudied: Do they make the right predictions according to rationales? In this paper, we take the first step toward answering this question and then introduce a new perspective on comprehensively evaluating a model. Specifically, we first conduct annotations to provide the rationales considered by humans in DocRE. Then, we conduct investigations and reveal the fact that: In contrast to humans, the representative state-of-the-art (SOTA) models in DocRE exhibit different decision rules. Through our proposed RE-specific attacks, we next demonstrate that the significant discrepancy in decision rules between models and humans severely damages the robustness of models and renders them inapplicable to real-world RE scenarios. After that, we introduce mean average precision (MAP) to evaluate the understanding and reasoning capabilities of models. According to the extensive experimental results, we finally appeal to future work to consider evaluating both performance and the understanding ability of models for the development of their applications. We make our annotations and code publicly available.

Via

Access Paper or Ask Questions