Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus

Feb 22, 2024

Honghao Gui, Hongbin Ye, Lin Yuan, Ningyu Zhang, Mengshu Sun, Lei Liang, Huajun Chen

Figure 1 for IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus

Figure 2 for IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus

Figure 3 for IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus

Figure 4 for IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) demonstrate remarkable potential across various domains; however, they exhibit a significant performance gap in Information Extraction (IE). Note that high-quality instruction data is the vital key for enhancing the specific capabilities of LLMs, while current IE datasets tend to be small in scale, fragmented, and lack standardized schema. To this end, we introduce IEPile, a comprehensive bilingual (English and Chinese) IE instruction corpus, which contains approximately 0.32B tokens. We construct IEPile by collecting and cleaning 33 existing IE datasets, and introduce schema-based instruction generation to unearth a large-scale corpus. Experimental results on LLaMA and Baichuan demonstrate that using IEPile can enhance the performance of LLMs for IE, especially the zero-shot generalization. We open-source the resource and pre-trained models, hoping to provide valuable support to the NLP community.

* Ongoing work; 18 pages; Github: https://github.com/zjunlp/IEPile

View paper on

Share this with someone who'll enjoy it:

Title:IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus

Paper and Code