Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rebecca Ruppel

Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks

Jan 24, 2019

Paul Azunre, Craig Corcoran, Numa Dhamani, Jeffrey Gleason, Garrett Honke, David Sullivan, Rebecca Ruppel, Sandeep Verma, Jonathon Morgan

Figure 1 for Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks

Figure 2 for Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks

Figure 3 for Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks

Figure 4 for Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks

Abstract:A character-level convolutional neural network (CNN) motivated by applications in "automated machine learning" (AutoML) is proposed to semantically classify columns in tabular data. Simulated data containing a set of base classes is first used to learn an initial set of weights. Hand-labeled data from the CKAN repository is then used in a transfer-learning paradigm to adapt the initial weights to a more sophisticated representation of the problem (e.g., including more classes). In doing so, realistic data imperfections are learned and the set of classes handled can be expanded from the base set with reduced labeled data and computing power requirements. Results show the effectiveness and flexibility of this approach in three diverse domains: semantic classification of tabular data, age prediction from social media posts, and email spam classification. In addition to providing further evidence of the effectiveness of transfer learning in natural language processing (NLP), our experiments suggest that analyzing the semantic structure of language at the character level without additional metadata---i.e., network structure, headers, etc.---can produce competitive accuracy for type classification, spam classification, and social media age prediction. We present our open-source toolkit SIMON, an acronym for Semantic Inference for the Modeling of ONtologies, which implements this approach in a user-friendly and scalable/parallelizable fashion.

Via

Access Paper or Ask Questions

Abstractive Tabular Dataset Summarization via Knowledge Base Semantic Embeddings

Apr 05, 2018

Paul Azunre, Craig Corcoran, David Sullivan, Garrett Honke, Rebecca Ruppel, Sandeep Verma, Jonathon Morgan

Figure 1 for Abstractive Tabular Dataset Summarization via Knowledge Base Semantic Embeddings

Figure 2 for Abstractive Tabular Dataset Summarization via Knowledge Base Semantic Embeddings

Figure 3 for Abstractive Tabular Dataset Summarization via Knowledge Base Semantic Embeddings

Figure 4 for Abstractive Tabular Dataset Summarization via Knowledge Base Semantic Embeddings

Abstract:This paper describes an abstractive summarization method for tabular data which employs a knowledge base semantic embedding to generate the summary. Assuming the dataset contains descriptive text in headers, columns and/or some augmenting metadata, the system employs the embedding to recommend a subject/type for each text segment. Recommendations are aggregated into a small collection of super types considered to be descriptive of the dataset by exploiting the hierarchy of types in a pre-specified ontology. Using February 2015 Wikipedia as the knowledge base, and a corresponding DBpedia ontology as types, we present experimental results on open data taken from several sources--OpenML, CKAN and data.world--to illustrate the effectiveness of the approach.

Via

Access Paper or Ask Questions