Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Al Zadid Sultan Bin Habib

BSTabDiff: Block-Subunit Diffusion Priors for High-Dimensional Tabular Data Generation

Jun 08, 2026

Al Zadid Sultan Bin Habib, Md Younus Ahamed, Prashnna Gyawali, Gianfranco Doretto, Donald A. Adjeroh

Abstract:High-Dimensional Low-Sample Size (HDLSS) tabular domains (e.g., omics) are characterized by $n \ll m$, where $n$ = number of samples, and $m$ = number of features. Such domains often exhibit strong local correlation groups, sparse cross-group dependencies, heavy-tailed non-Gaussian marginals, heteroscedastic noise, and structured missingness, making direct density learning in $\mathbb{R}^m$ ill-conditioned since $n \ll m$. We propose BSTabDiff, a block-subunit generative framework that partitions the $m$ observed features into $M$ latent blocks ($M \ll m$) and generates each block via a shared low-dimensional subunit variable, concentrating global dependence learning in the compact block-latent space $\mathbb{R}^M$ while decoding to the full feature space with copula-driven dependence, flexible per-feature marginals, and explicit missingness mechanisms. BSTabDiff supports modern deep priors on block latents, including diffusion and normalizing flows, enabling stable synthesis and controllable benchmark generation in the HDLSS regime. Empirically, BSTabDiff produces more realistic and stable high-dimensional synthetic data when compared with unstructured tabular generators on HDLSS data.

* Published as a paper at the 2nd DeLTa Workshop, ICLR 2026

Via

Access Paper or Ask Questions

GOTabPFN: From Feature Ordering to Compact Tokenization for Tabular Foundation Models on High-Dimensional Data

Jun 03, 2026

Al Zadid Sultan Bin Habib, Md Younus Ahamed, Prashnna Kumar Gyawali, Gianfranco Doretto, Donald A. Adjeroh

Abstract:We investigate how to make small tabular foundation models effective for High-Dimensional, Low-Sample Size (HDLSS) tabular prediction without retraining large backbones. We introduce Graph-guided Ordering with Local Refinement (GO-LR), show its equivalence to weighted Minimum Linear Arrangement, and interpret the practical solver as a TSP-path-style surrogate. We propose GOTabPFN,which builds on GO-LR, and a Neuro-Inspired Subunit Compression (NSC) unit to pool locally adjacent ordered features into meta-features, yielding a compact representation that makes TabPFN-style prediction practical in HDLSS regimes. Across tabular benchmarks, GOTabPFN improves stability and accuracy under tight token budgets.

* Accepted to the 43rd International Conference on Machine Learning (ICML 2026). Code and resources are available at: GitHub: https://github.com/zadid6pretam/GOTabPFN; PyPI: https://pypi.org/project/gotabpfn/; Project webpage: https://www.zadidhabib.com/gotabpfn.html; Hugging Face Space: https://huggingface.co/spaces/zadid6pretam/GOTabPFN and https://zadid6pretam-GOTabPFN.hf.space

Via

Access Paper or Ask Questions

DynaTab: Dynamic Feature Ordering as Neural Rewiring for High-Dimensional Tabular Data

May 05, 2026

Al Zadid Sultan Bin Habib, Gianfranco Doretto, Donald A. Adjeroh

Abstract:High-dimensional tabular data lacks a natural feature order, limiting the applicability of permutation-sensitive deep learning models. We propose DynaTab, a dynamic feature ordering-enabled architecture inspired by neural rewiring. We introduce a lightweight criterion that predicts when feature permutation will benefit a dataset by quantifying its intrinsic complexity. DynaTab dynamically reorders features via a neural rewiring algorithm and processes them through a compact, dynamic order-aware combination of separate learned positional embedding, importance-based gating, and masked attention layers, compatible with any sequence-sensitive backbone. Trained end-to-end with bespoke dynamic feature ordering (DFO) and dispersion losses, DynaTab achieves statistically significant gains, particularly on high-dimensional datasets, where it is benchmarked against 45 state-of-the-art baselines across 36 different real-world tabular datasets. Our results position DynaTab as a compelling new paradigm for high-dimensional tabular deep learning.

* This paper has been accepted for archival publication in the PMLR proceedings of the AAAI 2026 Neuro for AI \& AI for Neuro: Towards Multi-Modal Natural Intelligence (NeuroAI) Workshop, Code: https://github.com/zadid6pretam/DynaTab, PyPI: pip install dynatab

Via

Access Paper or Ask Questions

TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering

Oct 17, 2024

Al Zadid Sultan Bin Habib, Kesheng Wang, Mary-Anne Hartley, Gianfranco Doretto, Donald A. Adjeroh

Figure 1 for TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering

Figure 2 for TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering

Figure 3 for TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering

Figure 4 for TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering

Abstract:Effective analysis of tabular data still poses a significant problem in deep learning, mainly because features in tabular datasets are often heterogeneous and have different levels of relevance. This work introduces TabSeq, a novel framework for the sequential ordering of features, addressing the vital necessity to optimize the learning process. Features are not always equally informative, and for certain deep learning models, their random arrangement can hinder the model's learning capacity. Finding the optimum sequence order for such features could improve the deep learning models' learning process. The novel feature ordering technique we provide in this work is based on clustering and incorporates both local ordering and global ordering. It is designed to be used with a multi-head attention mechanism in a denoising autoencoder network. Our framework uses clustering to align comparable features and improve data organization. Multi-head attention focuses on essential characteristics, whereas the denoising autoencoder highlights important aspects by rebuilding from distorted inputs. This method improves the capability to learn from tabular data while lowering redundancy. Our research, demonstrating improved performance through appropriate feature sequence rearrangement using raw antibody microarray and two other real-world biomedical datasets, validates the impact of feature ordering. These results demonstrate that feature ordering can be a viable approach to improved deep learning of tabular data.

* This paper has been accepted for presentation at the 26th International Conference on Pattern Recognition (ICPR 2024) in Kolkata, India

Via

Access Paper or Ask Questions

Cardiovascular Disease Risk Prediction via Social Media

Sep 28, 2023

Al Zadid Sultan Bin Habib, Md Asif Bin Syed, Md Tanvirul Islam, Donald A. Adjeroh

Abstract:Researchers use Twitter and sentiment analysis to predict Cardiovascular Disease (CVD) risk. We developed a new dictionary of CVD-related keywords by analyzing emotions expressed in tweets. Tweets from eighteen US states, including the Appalachian region, were collected. Using the VADER model for sentiment analysis, users were classified as potentially at CVD risk. Machine Learning (ML) models were employed to classify individuals' CVD risk and applied to a CDC dataset with demographic information to make the comparison. Performance evaluation metrics such as Test Accuracy, Precision, Recall, F1 score, Mathew's Correlation Coefficient (MCC), and Cohen's Kappa (CK) score were considered. Results demonstrated that analyzing tweets' emotions surpassed the predictive power of demographic data alone, enabling the identification of individuals at potential risk of developing CVD. This research highlights the potential of Natural Language Processing (NLP) and ML techniques in using tweets to identify individuals with CVD risks, providing an alternative approach to traditional demographic information for public health monitoring.

* 9 pages, 3 figures, 16th International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation (SBP-BRiMS 2023)

Via

Access Paper or Ask Questions