Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Donggil Kang

CAST: Cluster-Aware Self-Training for Tabular Data

Oct 10, 2023

Minwook Kim, Juseong Kim, Kibeom Kim, Donggil Kang, Giltae Song

Figure 1 for CAST: Cluster-Aware Self-Training for Tabular Data

Figure 2 for CAST: Cluster-Aware Self-Training for Tabular Data

Figure 3 for CAST: Cluster-Aware Self-Training for Tabular Data

Figure 4 for CAST: Cluster-Aware Self-Training for Tabular Data

Abstract:Self-training has gained attraction because of its simplicity and versatility, yet it is vulnerable to noisy pseudo-labels. Several studies have proposed successful approaches to tackle this issue, but they have diminished the advantages of self-training because they require specific modifications in self-training algorithms or model architectures. Furthermore, most of them are incompatible with gradient boosting decision trees, which dominate the tabular domain. To address this, we revisit the cluster assumption, which states that data samples that are close to each other tend to belong to the same class. Inspired by the assumption, we propose Cluster-Aware Self-Training (CAST) for tabular data. CAST is a simple and universally adaptable approach for enhancing existing self-training algorithms without significant modifications. Concretely, our method regularizes the confidence of the classifier, which represents the value of the pseudo-label, forcing the pseudo-labels in low-density regions to have lower confidence by leveraging prior knowledge for each class within the training data. Extensive empirical evaluations on up to 20 real-world datasets confirm not only the superior performance of CAST but also its robustness in various setups in self-training contexts.

* 17 pages with appendix

Via

Access Paper or Ask Questions