Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ibna Kowsar

No Imputation of Missing Values In Tabular Data Classification Using Incremental Learning

Apr 20, 2025

Manar D. Samad, Kazi Fuad B. Akhter, Shourav B. Rabbani, Ibna Kowsar

Abstract:Tabular data sets with varying missing values are prepared for machine learning using an arbitrary imputation strategy. Synthetic values generated by imputation models often concern data stakeholders about computational complexity, data quality, and data-driven outcomes. This paper eliminates these concerns by proposing no imputation incremental learning (NIIL) of tabular data with varying missing value rates and types. The proposed method incrementally learns partitions of overlapping feature sets while using attention masks to exclude missing values from attention scoring. The average classification performance rank order across 15 diverse tabular data sets highlights the superiority of NIIL over 11 state-of-the-art learning methods with or without missing value imputations. Further experiments substantiate the robustness of NIIL against varying missing value types and rates compared to methods that involve the imputation of missing values. Our empirical analysis reveals that a feature partition size of half of the original feature space is, computation-wise and accuracy-wise, the best choice for the proposed incremental learning. The proposed method is one of the first deep learning solutions that can effectively learn tabular data without requiring the imputation of missing values.

Via

Access Paper or Ask Questions

DeepIFSA: Deep Imputation of Missing Values Using Feature and Sample Attention

Jan 19, 2025

Ibna Kowsar, Shourav B. Rabbani, Yina Hou, Manar D. Samad

Abstract:Missing values of varying patterns and rates in real-world tabular data pose a significant challenge in developing reliable data-driven models. Existing missing value imputation methods use statistical and traditional machine learning, which are ineffective when the missing rate is high and not at random. This paper explores row and column attention in tabular data to address the shortcomings of existing methods by introducing a new method for imputing missing values. The method combines between-feature and between-sample attention learning in a deep data reconstruction framework. The proposed data reconstruction uses CutMix data augmentation within a contrastive learning framework to improve the uncertainty of missing value estimation. The performance and generalizability of trained imputation models are evaluated on set-aside test data folds with missing values. The proposed joint attention learning outperforms nine state-of-the-art imputation methods across several missing value types and rates (10%-50%) on twelve data sets. Real electronic health records data with missing values yield the best classification accuracy when imputed using the proposed attention learning compared to other statistical, machine learning, and deep imputation methods. This paper highlights the heterogeneity of tabular data sets to recommend imputation methods based on missing value types and data characteristics.

Via

Access Paper or Ask Questions

Transfer Learning of Tabular Data by Finetuning Large Language Models

Jan 12, 2025

Shourav B. Rabbani, Ibna Kowsar, Manar D. Samad

Abstract:Despite the artificial intelligence (AI) revolution, deep learning has yet to achieve much success with tabular data due to heterogeneous feature space and limited sample sizes without viable transfer learning. The new era of generative AI, powered by large language models (LLM), brings unprecedented learning opportunities to diverse data and domains. This paper investigates the effectiveness of an LLM application programming interface (API) and transfer learning of LLM in tabular data classification. LLM APIs respond to input text prompts with tokenized data and instructions, whereas transfer learning finetunes an LLM for a target classification task. This paper proposes an end-to-end finetuning of LLM to demonstrate cross-data transfer learning on ten benchmark data sets when large pre-trained tabular data models do not exist to facilitate transfer learning. The proposed LLM finetuning method outperforms state-of-the-art machine and deep learning methods on tabular data with less than ten features - a standard feature size for tabular data sets. The transfer learning approach uses a fraction of the computational cost of other deep learning or API-based solutions while ensuring competitive or superior classification performance.

Via

Access Paper or Ask Questions