Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Krzysztof Dembczyński

A General Online Algorithm for Optimizing Complex Performance Metrics

Jun 20, 2024

Wojciech Kotłowski, Marek Wydmuch, Erik Schultheis, Rohit Babbar, Krzysztof Dembczyński

Figure 1 for A General Online Algorithm for Optimizing Complex Performance Metrics

Figure 2 for A General Online Algorithm for Optimizing Complex Performance Metrics

Figure 3 for A General Online Algorithm for Optimizing Complex Performance Metrics

Figure 4 for A General Online Algorithm for Optimizing Complex Performance Metrics

Abstract:We consider sequential maximization of performance metrics that are general functions of a confusion matrix of a classifier (such as precision, F-measure, or G-mean). Such metrics are, in general, non-decomposable over individual instances, making their optimization very challenging. While they have been extensively studied under different frameworks in the batch setting, their analysis in the online learning regime is very limited, with only a few distinguished exceptions. In this paper, we introduce and analyze a general online algorithm that can be used in a straightforward way with a variety of complex performance metrics in binary, multi-class, and multi-label classification problems. The algorithm's update and prediction rules are appealingly simple and computationally efficient without the need to store any past data. We show the algorithm attains $\mathcal{O}(\frac{\ln n}{n})$ regret for concave and smooth metrics and verify the efficiency of the proposed algorithm in empirical studies.

* This is the authors' version of the work accepted to ICML 2024

Via

Access Paper or Ask Questions

Consistent algorithms for multi-label classification with macro-at-$k$ metrics

Jan 29, 2024

Erik Schultheis, Wojciech Kotłowski, Marek Wydmuch, Rohit Babbar, Strom Borman, Krzysztof Dembczyński

Abstract:We consider the optimization of complex performance metrics in multi-label classification under the population utility framework. We mainly focus on metrics linearly decomposable into a sum of binary classification utilities applied separately to each label with an additional requirement of exactly $k$ labels predicted for each instance. These "macro-at-$k$" metrics possess desired properties for extreme classification problems with long tail labels. Unfortunately, the at-$k$ constraint couples the otherwise independent binary classification tasks, leading to a much more challenging optimization problem than standard macro-averages. We provide a statistical framework to study this problem, prove the existence and the form of the optimal classifier, and propose a statistically consistent and practical learning algorithm based on the Frank-Wolfe method. Interestingly, our main results concern even more general metrics being non-linear functions of label-wise confusion matrices. Empirical results provide evidence for the competitive performance of the proposed approach.

* This is the authors' version of the work accepted to ICLR 2024

Via

Access Paper or Ask Questions

Generalized test utilities for long-tail performance in extreme multi-label classification

Nov 09, 2023

Erik Schultheis, Marek Wydmuch, Wojciech Kotłowski, Rohit Babbar, Krzysztof Dembczyński

Abstract:Extreme multi-label classification (XMLC) is the task of selecting a small subset of relevant labels from a very large set of possible labels. As such, it is characterized by long-tail labels, i.e., most labels have very few positive instances. With standard performance measures such as precision@k, a classifier can ignore tail labels and still report good performance. However, it is often argued that correct predictions in the tail are more interesting or rewarding, but the community has not yet settled on a metric capturing this intuitive concept. The existing propensity-scored metrics fall short on this goal by confounding the problems of long-tail and missing labels. In this paper, we analyze generalized metrics budgeted "at k" as an alternative solution. To tackle the challenging problem of optimizing these metrics, we formulate it in the expected test utility (ETU) framework, which aims at optimizing the expected performance on a fixed test set. We derive optimal prediction rules and construct computationally efficient approximations with provable regret guarantees and robustness against model misspecification. Our algorithm, based on block coordinate ascent, scales effortlessly to XMLC problems and obtains promising results in terms of long-tail performance.

* This is the authors' version of the work accepted to NeurIPS 2023

Via

Access Paper or Ask Questions

On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification

Jul 26, 2022

Erik Schultheis, Marek Wydmuch, Rohit Babbar, Krzysztof Dembczyński

Figure 1 for On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification

Figure 2 for On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification

Figure 3 for On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification

Figure 4 for On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification

Abstract:The propensity model introduced by Jain et al. 2016 has become a standard approach for dealing with missing and long-tail labels in extreme multi-label classification (XMLC). In this paper, we critically revise this approach showing that despite its theoretical soundness, its application in contemporary XMLC works is debatable. We exhaustively discuss the flaws of the propensity-based approach, and present several recipes, some of them related to solutions used in search engines and recommender systems, that we believe constitute promising alternatives to be followed in XMLC.

* This is the author's version of the work accepted at KDD '22

Via

Access Paper or Ask Questions

Set-valued prediction in hierarchical classification with constrained representation complexity

Mar 13, 2022

Thomas Mortier, Eyke Hüllermeier, Krzysztof Dembczyński, Willem Waegeman

Figure 1 for Set-valued prediction in hierarchical classification with constrained representation complexity

Figure 2 for Set-valued prediction in hierarchical classification with constrained representation complexity

Figure 3 for Set-valued prediction in hierarchical classification with constrained representation complexity

Abstract:Set-valued prediction is a well-known concept in multi-class classification. When a classifier is uncertain about the class label for a test instance, it can predict a set of classes instead of a single class. In this paper, we focus on hierarchical multi-class classification problems, where valid sets (typically) correspond to internal nodes of the hierarchy. We argue that this is a very strong restriction, and we propose a relaxation by introducing the notion of representation complexity for a predicted set. In combination with probabilistic classifiers, this leads to a challenging inference problem for which specific combinatorial optimization algorithms are needed. We propose three methods and evaluate them on benchmark datasets: a na\"ive approach that is based on matrix-vector multiplication, a reformulation as a knapsack problem with conflict graph, and a recursive tree search method. Experimental results demonstrate that the last method is computationally more efficient than the other two approaches, due to a hierarchical factorization of the conditional class distribution.

Via

Access Paper or Ask Questions

Propensity-scored Probabilistic Label Trees

Oct 20, 2021

Marek Wydmuch, Kalina Jasinska-Kobus, Rohit Babbar, Krzysztof Dembczyński

Figure 1 for Propensity-scored Probabilistic Label Trees

Figure 2 for Propensity-scored Probabilistic Label Trees

Abstract:Extreme multi-label classification (XMLC) refers to the task of tagging instances with small subsets of relevant labels coming from an extremely large set of all possible labels. Recently, XMLC has been widely applied to diverse web applications such as automatic content labeling, online advertising, or recommendation systems. In such environments, label distribution is often highly imbalanced, consisting mostly of very rare tail labels, and relevant labels can be missing. As a remedy to these problems, the propensity model has been introduced and applied within several XMLC algorithms. In this work, we focus on the problem of optimal predictions under this model for probabilistic label trees, a popular approach for XMLC problems. We introduce an inference procedure, based on the $A^*$-search algorithm, that efficiently finds the optimal solution, assuming that all probabilities and propensities are known. We demonstrate the attractiveness of this approach in a wide empirical study on popular XMLC benchmark datasets.

* The extended version of SIGIR '21 Short Research Paper

Via

Access Paper or Ask Questions

Online probabilistic label trees

Jul 08, 2020

Kalina Jasinska-Kobus, Marek Wydmuch, Devanathan Thiruvenkatachari, Krzysztof Dembczyński

Figure 1 for Online probabilistic label trees

Figure 2 for Online probabilistic label trees

Figure 3 for Online probabilistic label trees

Figure 4 for Online probabilistic label trees

Abstract:We introduce online probabilistic label trees (OPLTs), an algorithm that trains a label tree classifier in a fully online manner, without any prior knowledge about the number of training instances, their features and labels. OPLTs are characterized by low time and space complexity as well as strong theoretical guarantees. They can be used for online multi-label and multi-class classification, including the very challenging scenarios of one- or few-shot learning. We demonstrate the attractiveness of OPLTs in a wide empirical study on several instances of the tasks mentioned above.

Via

Access Paper or Ask Questions

Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification

Jun 19, 2019

Thomas Mortier, Marek Wydmuch, Eyke Hüllermeier, Krzysztof Dembczyński, Willem Waegeman

Figure 1 for Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification

Figure 2 for Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification

Figure 3 for Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification

Figure 4 for Efficient Algorithms for Set-Valued Prediction in Multi-Class Classification

Abstract:In cases of uncertainty, a multi-class classifier preferably returns a set of candidate classes instead of predicting a single class label with little guarantee. More precisely, the classifier should strive for an optimal balance between the correctness (the true class is among the candidates) and the precision (the candidates are not too many) of its prediction. We formalize this problem within a general decision-theoretic framework that unifies most of the existing work in this area. In this framework, uncertainty is quantified in terms of conditional class probabilities, and the quality of a predicted set is measured in terms of a utility function. We then address the problem of finding the Bayes-optimal prediction, i.e., the subset of class labels with highest expected utility. For this problem, which is computationally challenging as there are exponentially (in the number of classes) many predictions to choose from, we propose efficient algorithms that can be applied to a broad family of utility scores. Two of these algorithms make use of structural information in the form of a class hierarchy, which is often available in prediction problems with many classes. Our theoretical results are complemented by experimental studies, in which we analyze the proposed algorithms in terms of predictive accuracy and runtime efficiency.

Via

Access Paper or Ask Questions

A no-regret generalization of hierarchical softmax to extreme multi-label classification

Oct 27, 2018

Marek Wydmuch, Kalina Jasinska, Mikhail Kuznetsov, Róbert Busa-Fekete, Krzysztof Dembczyński

Figure 1 for A no-regret generalization of hierarchical softmax to extreme multi-label classification

Figure 2 for A no-regret generalization of hierarchical softmax to extreme multi-label classification

Abstract:Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels. Large label spaces can be efficiently handled by organizing labels as a tree, like in the hierarchical softmax (HSM) approach commonly used for multi-class problems. In this paper, we investigate probabilistic label trees (PLTs) that have been recently devised for tackling XMLC problems. We show that PLTs are a no-regret multi-label generalization of HSM when precision@k is used as a model evaluation metric. Critically, we prove that pick-one-label heuristic - a reduction technique from multi-label to multi-class that is routinely used along with HSM - is not consistent in general. We also show that our implementation of PLTs, referred to as extremeText (XT), obtains significantly better results than HSM with the pick-one-label heuristic and XML-CNN, a deep network specifically designed for XMLC problems. Moreover, XT is competitive to many state-of-the-art approaches in terms of statistical performance, model size and prediction time which makes it amenable to deploy in an online system.

* Accepted at NIPS 2018

Via

Access Paper or Ask Questions

Surrogate regret bounds for generalized classification performance metrics

Oct 07, 2016

Wojciech Kotłowski, Krzysztof Dembczyński

Figure 1 for Surrogate regret bounds for generalized classification performance metrics

Figure 2 for Surrogate regret bounds for generalized classification performance metrics

Figure 3 for Surrogate regret bounds for generalized classification performance metrics

Figure 4 for Surrogate regret bounds for generalized classification performance metrics

Abstract:We consider optimization of generalized performance metrics for binary classification by means of surrogate losses. We focus on a class of metrics, which are linear-fractional functions of the false positive and false negative rates (examples of which include $F_{\beta}$-measure, Jaccard similarity coefficient, AM measure, and many others). Our analysis concerns the following two-step procedure. First, a real-valued function $f$ is learned by minimizing a surrogate loss for binary classification on the training sample. It is assumed that the surrogate loss is a strongly proper composite loss function (examples of which include logistic loss, squared-error loss, exponential loss, etc.). Then, given $f$, a threshold $\widehat{\theta}$ is tuned on a separate validation sample, by direct optimization of the target performance metric. We show that the regret of the resulting classifier (obtained from thresholding $f$ on $\widehat{\theta}$) measured with respect to the target metric is upperbounded by the regret of $f$ measured with respect to the surrogate loss. We also extend our results to cover multilabel classification and provide regret bounds for micro- and macro-averaging measures. Our findings are further analyzed in a computational study on both synthetic and real data sets.

* 22 pages

Via

Access Paper or Ask Questions