Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MEGClass: Text Classification with Extremely Weak Supervision via Mutually-Enhancing Text Granularities

Apr 04, 2023

Priyanka Kargupta, Tanay Komarlu, Susik Yoon, Xuan Wang, Jiawei Han

Figure 1 for MEGClass: Text Classification with Extremely Weak Supervision via Mutually-Enhancing Text Granularities

Figure 2 for MEGClass: Text Classification with Extremely Weak Supervision via Mutually-Enhancing Text Granularities

Figure 3 for MEGClass: Text Classification with Extremely Weak Supervision via Mutually-Enhancing Text Granularities

Figure 4 for MEGClass: Text Classification with Extremely Weak Supervision via Mutually-Enhancing Text Granularities

Share this with someone who'll enjoy it:

Abstract:Text classification typically requires a substantial amount of human-annotated data to serve as supervision, which is costly to obtain in dynamic emerging domains. Certain methods seek to address this problem by solely relying on the surface text of class names to serve as extremely weak supervision. However, existing methods fail to account for single-class documents discussing multiple topics. Both topic diversity and vague sentences may introduce noise into the document's underlying representation and consequently the precision of the predicted class. Furthermore, current work focuses on text granularities (documents, sentences, or words) independently, which limits the degree of coarse- or fine-grained context that we can jointly extract from all three to identify significant subtext for classification. In order to address this problem, we propose MEGClass, an extremely weakly-supervised text classification method to exploit Mutually-Enhancing Text Granularities. Specifically, MEGClass constructs class-oriented sentence and class representations based on keywords for performing a sentence-level confidence-weighted label ensemble in order to estimate a document's initial class distribution. This serves as the target distribution for a multi-head attention network with a class-weighted contrastive loss. This network learns contextualized sentence representations and weights to form document representations that reflect its original document and sentence-level topic diversity. Retaining this heterogeneity allows MEGClass to select the most class-indicative documents to serve as iterative feedback for enhancing the class representations. Finally, these top documents are used to fine-tune a pre-trained text classifier. As demonstrated through extensive experiments on six benchmark datasets, MEGClass outperforms other weakly and extremely weakly supervised methods.

* Code: https://github.com/pkargupta/MEGClass/

View paper on

Share this with someone who'll enjoy it:

Title:MEGClass: Text Classification with Extremely Weak Supervision via Mutually-Enhancing Text Granularities

Paper and Code