Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations

Oct 14, 2022

Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jaewoo Kang

Figure 1 for Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations

Figure 2 for Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations

Figure 3 for Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations

Figure 4 for Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations

Share this with someone who'll enjoy it:

Abstract:Most weakly supervised named entity recognition (NER) models rely on domain-specific dictionaries provided by experts. This approach is infeasible in many domains where dictionaries do not exist. While a phrase retrieval model was used to construct pseudo-dictionaries with entities retrieved from Wikipedia automatically in a recent study, these dictionaries often have limited coverage because the retriever is likely to retrieve popular entities rather than rare ones. In this study, a phrase embedding search to efficiently create high-coverage dictionaries is presented. Specifically, the reformulation of natural language queries into phrase representations allows the retriever to search a space densely populated with various entities. In addition, we present a novel framework, HighGEN, that generates NER datasets with high-coverage dictionaries obtained using the phrase embedding search. HighGEN generates weak labels based on the distance between the embeddings of a candidate phrase and target entity type to reduce the noise in high-coverage dictionaries. We compare HighGEN with current weakly supervised NER models on six NER benchmarks and demonstrate the superiority of our models.

View paper on

Share this with someone who'll enjoy it:

Title:Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations

Paper and Code