Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CTSyn: A Foundational Model for Cross Tabular Data Generation

Jun 07, 2024

Xiaofeng Lin, Chenheng Xu, Matthew Yang, Guang Cheng

Figure 1 for CTSyn: A Foundational Model for Cross Tabular Data Generation

Figure 2 for CTSyn: A Foundational Model for Cross Tabular Data Generation

Figure 3 for CTSyn: A Foundational Model for Cross Tabular Data Generation

Figure 4 for CTSyn: A Foundational Model for Cross Tabular Data Generation

Share this with someone who'll enjoy it:

Abstract:Generative Foundation Models (GFMs) have produced synthetic data with remarkable quality in modalities such as images and text. However, applying GFMs to tabular data poses significant challenges due to the inherent heterogeneity of table features. Existing cross-table learning frameworks are hindered by the absence of both a generative model backbone and a decoding mechanism for heterogeneous feature values. To overcome these limitations, we introduce the Cross-Table Synthesizer (CTSyn), a diffusion-based foundational model tailored for tabular data generation. CTSyn introduces three major components: an aggregator that consolidates heterogeneous tables into a unified latent space; a conditional latent diffusion model for sampling from this space; and type-specific decoders that reconstruct values of varied data types from sampled latent vectors. Extensive testing on real-world datasets reveals that CTSyn not only significantly outperforms existing table synthesizers in utility and diversity, but also uniquely enhances performances of downstream machine learning beyond what is achievable with real data, thus establishing a new paradigm for synthetic data generation.

View paper on

Share this with someone who'll enjoy it:

Title:CTSyn: A Foundational Model for Cross Tabular Data Generation

Paper and Code