Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cheng-Hung Liu

Exploring space efficiency in a tree-based linear model for extreme multi-label classification

Oct 12, 2024

He-Zhe Lin, Cheng-Hung Liu, Chih-Jen Lin

Figure 1 for Exploring space efficiency in a tree-based linear model for extreme multi-label classification

Figure 2 for Exploring space efficiency in a tree-based linear model for extreme multi-label classification

Figure 3 for Exploring space efficiency in a tree-based linear model for extreme multi-label classification

Figure 4 for Exploring space efficiency in a tree-based linear model for extreme multi-label classification

Abstract:Extreme multi-label classification (XMC) aims to identify relevant subsets from numerous labels. Among the various approaches for XMC, tree-based linear models are effective due to their superior efficiency and simplicity. However, the space complexity of tree-based methods is not well-studied. Many past works assume that storing the model is not affordable and apply techniques such as pruning to save space, which may lead to performance loss. In this work, we conduct both theoretical and empirical analyses on the space to store a tree model under the assumption of sparse data, a condition frequently met in text data. We found that, some features may be unused when training binary classifiers in a tree method, resulting in zero values in the weight vectors. Hence, storing only non-zero elements can greatly save space. Our experimental results indicate that tree models can achieve up to a 95% reduction in storage space compared to the standard one-vs-rest method for multi-label text classification. Our research provides a simple procedure to estimate the size of a tree model before training any classifier in the tree nodes. Then, if the model size is already acceptable, this approach can help avoid modifying the model through weight pruning or other techniques.

* EMNLP 2024

Via

Access Paper or Ask Questions

On the Use of Unrealistic Predictions in Hundreds of Papers Evaluating Graph Representations

Dec 13, 2021

Li-Chung Lin, Cheng-Hung Liu, Chih-Ming Chen, Kai-Chin Hsu, I-Feng Wu, Ming-Feng Tsai, Chih-Jen Lin

Figure 1 for On the Use of Unrealistic Predictions in Hundreds of Papers Evaluating Graph Representations

Figure 2 for On the Use of Unrealistic Predictions in Hundreds of Papers Evaluating Graph Representations

Figure 3 for On the Use of Unrealistic Predictions in Hundreds of Papers Evaluating Graph Representations

Abstract:Prediction using the ground truth sounds like an oxymoron in machine learning. However, such an unrealistic setting was used in hundreds, if not thousands of papers in the area of finding graph representations. To evaluate the multi-label problem of node classification by using the obtained representations, many works assume in the prediction stage that the number of labels of each test instance is known. In practice such ground truth information is rarely available, but we point out that such an inappropriate setting is now ubiquitous in this research area. We detailedly investigate why the situation occurs. Our analysis indicates that with unrealistic information, the performance is likely over-estimated. To see why suitable predictions were not used, we identify difficulties in applying some multi-label techniques. For the use in future studies, we propose simple and effective settings without using practically unknown information. Finally, we take this chance to conduct a fair and serious comparison of major graph-representation learning methods on multi-label node classification.

* Accepted by AAAI 2022

Via

Access Paper or Ask Questions