Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dan I. Moldovan

CEREC: A Corpus for Entity Resolution in Email Conversations

Jun 02, 2021

Parag Pravin Dakle, Dan I. Moldovan

Figure 1 for CEREC: A Corpus for Entity Resolution in Email Conversations

Figure 2 for CEREC: A Corpus for Entity Resolution in Email Conversations

Figure 3 for CEREC: A Corpus for Entity Resolution in Email Conversations

Figure 4 for CEREC: A Corpus for Entity Resolution in Email Conversations

Abstract:We present the first large scale corpus for entity resolution in email conversations (CEREC). The corpus consists of 6001 email threads from the Enron Email Corpus containing 36,448 email messages and 60,383 entity coreference chains. The annotation is carried out as a two-step process with minimal manual effort. Experiments are carried out for evaluating different features and performance of four baselines on the created corpus. For the task of mention identification and coreference resolution, a best performance of 59.2 F1 is reported, highlighting the room for improvement. An in-depth qualitative and quantitative error analysis is presented to understand the limitations of the baselines considered.

* Proceedings of the 28th International Conference on Computational Linguistics, pp. 339-349. 2020

Via

Access Paper or Ask Questions