Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning

Nov 10, 2023

Valeriia Cherepanova, Roman Levin, Gowthami Somepalli, Jonas Geiping, C. Bayan Bruss, Andrew Gordon Wilson, Tom Goldstein, Micah Goldblum

Figure 1 for A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning

Figure 2 for A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning

Figure 3 for A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning

Figure 4 for A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning

Share this with someone who'll enjoy it:

Abstract:Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent overfitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. Motivated by the increasing popularity of tabular deep learning, we construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems such as selecting from corrupted or second-order features.

* Conference on Neural Information Processing Systems 2023

View paper on

Share this with someone who'll enjoy it:

Title:A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning

Paper and Code