Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification

Jun 17, 2024

Letian Peng, Yi Gu, Chengyu Dong, Zihan Wang, Jingbo Shang

Figure 1 for Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification

Figure 2 for Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification

Figure 3 for Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification

Figure 4 for Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification

Share this with someone who'll enjoy it:

Abstract:For extremely weak-supervised text classification, pioneer research generates pseudo labels by mining texts similar to the class names from the raw corpus, which may end up with very limited or even no samples for the minority classes. Recent works have started to generate the relevant texts by prompting LLMs using the class names or definitions; however, there is a high risk that LLMs cannot generate in-distribution (i.e., similar to the corpus where the text classifier will be applied) data, leading to ungeneralizable classifiers. In this paper, we combine the advantages of these two approaches and propose to bridge the gap via a novel framework, \emph{text grafting}, which aims to obtain clean and near-distribution weak supervision for minority classes. Specifically, we first use LLM-based logits to mine masked templates from the raw corpus, which have a high potential for data synthesis into the target minority class. Then, the templates are filled by state-of-the-art LLMs to synthesize near-distribution texts falling into minority classes. Text grafting shows significant improvement over direct mining or synthesis on minority classes. We also use analysis and case studies to comprehend the property of text grafting.

View paper on

Share this with someone who'll enjoy it:

Title:Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification

Paper and Code