Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:TAEGAN: Generating Synthetic Tabular Data For Data Augmentation

Oct 02, 2024

Jiayu Li, Zilong Zhao, Kevin Yee, Uzair Javaid, Biplab Sikdar

Figure 1 for TAEGAN: Generating Synthetic Tabular Data For Data Augmentation

Figure 2 for TAEGAN: Generating Synthetic Tabular Data For Data Augmentation

Figure 3 for TAEGAN: Generating Synthetic Tabular Data For Data Augmentation

Figure 4 for TAEGAN: Generating Synthetic Tabular Data For Data Augmentation

Share this with someone who'll enjoy it:

Abstract:Synthetic tabular data generation has gained significant attention for its potential in data augmentation, software testing and privacy-preserving data sharing. However, most research has primarily focused on larger datasets and evaluating their quality in terms of metrics like column-wise statistical distributions and inter-feature correlations, while often overlooking its utility for data augmentation, particularly for datasets whose data is scarce. In this paper, we propose Tabular Auto-Encoder Generative Adversarial Network (TAEGAN), an improved GAN-based framework for generating high-quality tabular data. Although large language models (LLMs)-based methods represent the state-of-the-art in synthetic tabular data generation, they are often overkill for small datasets due to their extensive size and complexity. TAEGAN employs a masked auto-encoder as the generator, which for the first time introduces the power of self-supervised pre-training in tabular data generation so that essentially exposes the networks to more information. We extensively evaluate TAEGAN against five state-of-the-art synthetic tabular data generation algorithms. Results from 10 datasets show that TAEGAN outperforms existing deep-learning-based tabular data generation models on 9 out of 10 datasets on the machine learning efficacy and achieves superior data augmentation performance on 7 out of 8 smaller datasets.

View paper on

Share this with someone who'll enjoy it:

Title:TAEGAN: Generating Synthetic Tabular Data For Data Augmentation

Paper and Code