Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Likun Lei

Long-tailed Extreme Multi-label Text Classification with Generated Pseudo Label Descriptions

Apr 02, 2022

Ruohong Zhang, Yau-Shian Wang, Yiming Yang, Donghan Yu, Tom Vu, Likun Lei

Figure 1 for Long-tailed Extreme Multi-label Text Classification with Generated Pseudo Label Descriptions

Figure 2 for Long-tailed Extreme Multi-label Text Classification with Generated Pseudo Label Descriptions

Figure 3 for Long-tailed Extreme Multi-label Text Classification with Generated Pseudo Label Descriptions

Figure 4 for Long-tailed Extreme Multi-label Text Classification with Generated Pseudo Label Descriptions

Abstract:Extreme Multi-label Text Classification (XMTC) has been a tough challenge in machine learning research and applications due to the sheer sizes of the label spaces and the severe data scarce problem associated with the long tail of rare labels in highly skewed distributions. This paper addresses the challenge of tail label prediction by proposing a novel approach, which combines the effectiveness of a trained bag-of-words (BoW) classifier in generating informative label descriptions under severe data scarce conditions, and the power of neural embedding based retrieval models in mapping input documents (as queries) to relevant label descriptions. The proposed approach achieves state-of-the-art performance on XMTC benchmark datasets and significantly outperforms the best methods so far in the tail label prediction. We also provide a theoretical analysis for relating the BoW and neural models w.r.t. performance lower bound.

Via

Access Paper or Ask Questions

Exploiting Local and Global Features in Transformer-based Extreme Multi-label Text Classification

Apr 02, 2022

Ruohong Zhang, Yau-Shian Wang, Yiming Yang, Tom Vu, Likun Lei

Figure 1 for Exploiting Local and Global Features in Transformer-based Extreme Multi-label Text Classification

Figure 2 for Exploiting Local and Global Features in Transformer-based Extreme Multi-label Text Classification

Figure 3 for Exploiting Local and Global Features in Transformer-based Extreme Multi-label Text Classification

Figure 4 for Exploiting Local and Global Features in Transformer-based Extreme Multi-label Text Classification

Abstract:Extreme multi-label text classification (XMTC) is the task of tagging each document with the relevant labels from a very large space of predefined categories. Recently, large pre-trained Transformer models have made significant performance improvements in XMTC, which typically use the embedding of the special CLS token to represent the entire document semantics as a global feature vector, and match it against candidate labels. However, we argue that such a global feature vector may not be sufficient to represent different granularity levels of semantics in the document, and that complementing it with the local word-level features could bring additional gains. Based on this insight, we propose an approach that combines both the local and global features produced by Transformer models to improve the prediction power of the classifier. Our experiments show that the proposed model either outperforms or is comparable to the state-of-the-art methods on benchmark datasets.

Via

Access Paper or Ask Questions