Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marvin Grimm

PMLBmini: A Tabular Classification Benchmark Suite for Data-Scarce Applications

Sep 03, 2024

Ricardo Knauer, Marvin Grimm, Erik Rodner

Figure 1 for PMLBmini: A Tabular Classification Benchmark Suite for Data-Scarce Applications

Figure 2 for PMLBmini: A Tabular Classification Benchmark Suite for Data-Scarce Applications

Figure 3 for PMLBmini: A Tabular Classification Benchmark Suite for Data-Scarce Applications

Figure 4 for PMLBmini: A Tabular Classification Benchmark Suite for Data-Scarce Applications

Abstract:In practice, we are often faced with small-sized tabular data. However, current tabular benchmarks are not geared towards data-scarce applications, making it very difficult to derive meaningful conclusions from empirical comparisons. We introduce PMLBmini, a tabular benchmark suite of 44 binary classification datasets with sample sizes $\leq$ 500. We use our suite to thoroughly evaluate current automated machine learning (AutoML) frameworks, off-the-shelf tabular deep neural networks, as well as classical linear models in the low-data regime. Our analysis reveals that state-of-the-art AutoML and deep learning approaches often fail to appreciably outperform even a simple logistic regression baseline, but we also identify scenarios where AutoML and deep learning methods are indeed reasonable to apply. Our benchmark suite, available on https://github.com/RicardoKnauer/TabMini , allows researchers and practitioners to analyze their own methods and challenge their data efficiency.

* AutoML 2024 Workshop Track

Via

Access Paper or Ask Questions