Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junwei Ma

CausalPFN: Amortized Causal Effect Estimation via In-Context Learning

Jun 09, 2025

Vahid Balazadeh, Hamidreza Kamkari, Valentin Thomas, Benson Li, Junwei Ma, Jesse C. Cresswell, Rahul G. Krishnan

Abstract:Causal effect estimation from observational data is fundamental across various applications. However, selecting an appropriate estimator from dozens of specialized methods demands substantial manual effort and domain expertise. We present CausalPFN, a single transformer that amortizes this workflow: trained once on a large library of simulated data-generating processes that satisfy ignorability, it infers causal effects for new observational datasets out-of-the-box. CausalPFN combines ideas from Bayesian causal inference with the large-scale training protocol of prior-fitted networks (PFNs), learning to map raw observations directly to causal effects without any task-specific adjustment. Our approach achieves superior average performance on heterogeneous and average treatment effect estimation benchmarks (IHDP, Lalonde, ACIC). Moreover, it shows competitive performance for real-world policy making on uplift modeling tasks. CausalPFN provides calibrated uncertainty estimates to support reliable decision-making based on Bayesian principles. This ready-to-use model does not require any further training or tuning and takes a step toward automated causal inference (https://github.com/vdblm/CausalPFN).

Via

Access Paper or Ask Questions

TabDPT: Scaling Tabular Foundation Models

Oct 23, 2024

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Hamidreza Kamkari, Alex Labach, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Maksims Volkovs, Anthony L. Caterini

Figure 1 for TabDPT: Scaling Tabular Foundation Models

Figure 2 for TabDPT: Scaling Tabular Foundation Models

Figure 3 for TabDPT: Scaling Tabular Foundation Models

Figure 4 for TabDPT: Scaling Tabular Foundation Models

Abstract:The challenges faced by neural networks on tabular data are well-documented and have hampered the progress of tabular foundation models. Techniques leveraging in-context learning (ICL) have shown promise here, allowing for dynamic adaptation to unseen data. ICL can provide predictions for entirely new datasets without further training or hyperparameter tuning, therefore providing very fast inference when encountering a novel task. However, scaling ICL for tabular data remains an issue: approaches based on large language models cannot efficiently process numeric tables, and tabular-specific techniques have not been able to effectively harness the power of real data to improve performance and generalization. We are able to overcome these challenges by training tabular-specific ICL-based architectures on real data with self-supervised learning and retrieval, combining the best of both worlds. Our resulting model -- the Tabular Discriminative Pre-trained Transformer (TabDPT) -- achieves state-of-the-art performance on the CC18 (classification) and CTR23 (regression) benchmarks with no task-specific fine-tuning, demonstrating the adapatability and speed of ICL once the model is pre-trained. TabDPT also demonstrates strong scaling as both model size and amount of available data increase, pointing towards future improvements simply through the curation of larger tabular pre-training datasets and training larger models.

* Minimal TabDPT interface to provide predictions on new datasets available at the following link: https://github.com/layer6ai-labs/TabDPT

Via

Access Paper or Ask Questions

Retrieval & Fine-Tuning for In-Context Tabular Models

Jun 07, 2024

Valentin Thomas, Junwei Ma, Rasa Hosseinzadeh, Keyvan Golestan, Guangwei Yu, Maksims Volkovs, Anthony Caterini

Abstract:Tabular data is a pervasive modality spanning a wide range of domains, and the inherent diversity poses a considerable challenge for deep learning. Recent advancements using transformer-based in-context learning have shown promise on smaller and less complex datasets, but have struggled to scale to larger and more complex ones. To address this limitation, we propose a combination of retrieval and fine-tuning: we can adapt the transformer to a local subset of the data by collecting nearest neighbours, and then perform task-specific fine-tuning with this retrieved set of neighbours in context. Using TabPFN as the base model -- currently the best tabular in-context learner -- and applying our retrieval and fine-tuning scheme on top results in what we call a locally-calibrated PFN, or LoCalPFN. We conduct extensive evaluation on 95 datasets curated by TabZilla from OpenML, upon which we establish a new state-of-the-art with LoCalPFN -- even with respect to tuned tree-based models. Notably, we show a significant boost in performance compared to the base in-context model, demonstrating the efficacy of our approach and advancing the frontier of deep learning in tabular data.

Via

Access Paper or Ask Questions

TabPFGen -- Tabular Data Generation with TabPFN

Jun 07, 2024

Junwei Ma, Apoorv Dankar, George Stein, Guangwei Yu, Anthony Caterini

Figure 1 for TabPFGen -- Tabular Data Generation with TabPFN

Figure 2 for TabPFGen -- Tabular Data Generation with TabPFN

Figure 3 for TabPFGen -- Tabular Data Generation with TabPFN

Figure 4 for TabPFGen -- Tabular Data Generation with TabPFN

Abstract:Advances in deep generative modelling have not translated well to tabular data. We argue that this is caused by a mismatch in structure between popular generative models and discriminative models of tabular data. We thus devise a technique to turn TabPFN -- a highly performant transformer initially designed for in-context discriminative tabular tasks -- into an energy-based generative model, which we dub TabPFGen. This novel framework leverages the pre-trained TabPFN as part of the energy function and does not require any additional training or hyperparameter tuning, thus inheriting TabPFN's in-context learning capability. We can sample from TabPFGen analogously to other energy-based models. We demonstrate strong results on standard generative modelling tasks, including data augmentation, class-balancing, and imputation, unlocking a new frontier of tabular data generation.

Via

Access Paper or Ask Questions

Tabular Data Contrastive Learning via Class-Conditioned and Feature-Correlation Based Augmentation

Apr 30, 2024

Wei Cui, Rasa Hosseinzadeh, Junwei Ma, Tongzi Wu, Yi Sui, Keyvan Golestan

Figure 1 for Tabular Data Contrastive Learning via Class-Conditioned and Feature-Correlation Based Augmentation

Figure 2 for Tabular Data Contrastive Learning via Class-Conditioned and Feature-Correlation Based Augmentation

Figure 3 for Tabular Data Contrastive Learning via Class-Conditioned and Feature-Correlation Based Augmentation

Figure 4 for Tabular Data Contrastive Learning via Class-Conditioned and Feature-Correlation Based Augmentation

Abstract:Contrastive learning is a model pre-training technique by first creating similar views of the original data, and then encouraging the data and its corresponding views to be close in the embedding space. Contrastive learning has witnessed success in image and natural language data, thanks to the domain-specific augmentation techniques that are both intuitive and effective. Nonetheless, in tabular domain, the predominant augmentation technique for creating views is through corrupting tabular entries via swapping values, which is not as sound or effective. We propose a simple yet powerful improvement to this augmentation technique: corrupting tabular data conditioned on class identity. Specifically, when corrupting a specific tabular entry from an anchor row, instead of randomly sampling a value in the same feature column from the entire table uniformly, we only sample from rows that are identified to be within the same class as the anchor row. We assume the semi-supervised learning setting, and adopt the pseudo labeling technique for obtaining class identities over all table rows. We also explore the novel idea of selecting features to be corrupted based on feature correlation structures. Extensive experiments show that the proposed approach consistently outperforms the conventional corruption method for tabular data classification tasks. Our code is available at https://github.com/willtop/Tabular-Class-Conditioned-SSL.

* 14 pages, 4 algorithms, 3 figures, 5 tables

Via

Access Paper or Ask Questions

In-Context Data Distillation with TabPFN

Feb 10, 2024

Junwei Ma, Valentin Thomas, Guangwei Yu, Anthony Caterini

Abstract:Foundation models have revolutionized tasks in computer vision and natural language processing. However, in the realm of tabular data, tree-based models like XGBoost continue to dominate. TabPFN, a transformer model tailored for tabular data, mirrors recent foundation models in its exceptional in-context learning capability, being competitive with XGBoost's performance without the need for task-specific training or hyperparameter tuning. Despite its promise, TabPFN's applicability is hindered by its data size constraint, limiting its use in real-world scenarios. To address this, we present in-context data distillation (ICD), a novel methodology that effectively eliminates these constraints by optimizing TabPFN's context. ICD efficiently enables TabPFN to handle significantly larger datasets with a fixed memory budget, improving TabPFN's quadratic memory complexity but at the cost of a linear number of tuning steps. Notably, TabPFN, enhanced with ICD, demonstrates very strong performance against established tree-based models and modern deep learning methods on 48 large tabular datasets from OpenML.

Via

Access Paper or Ask Questions

Attributed Network Embedding Model for Exposing COVID-19 Spread Trajectory Archetypes

Sep 21, 2022

Junwei Ma, Bo Li, Qingchun Li, Chao Fan, Ali Mostafavi

Figure 1 for Attributed Network Embedding Model for Exposing COVID-19 Spread Trajectory Archetypes

Figure 2 for Attributed Network Embedding Model for Exposing COVID-19 Spread Trajectory Archetypes

Figure 3 for Attributed Network Embedding Model for Exposing COVID-19 Spread Trajectory Archetypes

Figure 4 for Attributed Network Embedding Model for Exposing COVID-19 Spread Trajectory Archetypes

Abstract:The spread of COVID-19 revealed that transmission risk patterns are not homogenous across different cities and communities, and various heterogeneous features can influence the spread trajectories. Hence, for predictive pandemic monitoring, it is essential to explore latent heterogeneous features in cities and communities that distinguish their specific pandemic spread trajectories. To this end, this study creates a network embedding model capturing cross-county visitation networks, as well as heterogeneous features to uncover clusters of counties in the United States based on their pandemic spread transmission trajectories. We collected and computed location intelligence features from 2,787 counties from March 3 to June 29, 2020 (initial wave). Second, we constructed a human visitation network, which incorporated county features as node attributes, and visits between counties as network edges. Our attributed network embeddings approach integrates both typological characteristics of the cross-county visitation network, as well as heterogeneous features. We conducted clustering analysis on the attributed network embeddings to reveal four archetypes of spread risk trajectories corresponding to four clusters of counties. Subsequently, we identified four features as important features underlying the distinctive transmission risk patterns among the archetypes. The attributed network embedding approach and the findings identify and explain the non-homogenous pandemic risk trajectories across counties for predictive pandemic monitoring. The study also contributes to data-driven and deep learning-based approaches for pandemic analytics to complement the standard epidemiological models for policy analysis in pandemics.

Via

Access Paper or Ask Questions

X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

Mar 28, 2022

Satya Krishna Gorti, Noel Vouitsis, Junwei Ma, Keyvan Golestan, Maksims Volkovs, Animesh Garg, Guangwei Yu

Figure 1 for X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

Figure 2 for X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

Figure 3 for X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

Figure 4 for X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

Abstract:In text-video retrieval, the objective is to learn a cross-modal similarity function between a text and a video that ranks relevant text-video pairs higher than irrelevant pairs. However, videos inherently express a much wider gamut of information than texts. Instead, texts often capture sub-regions of entire videos and are most semantically similar to certain frames within videos. Therefore, for a given text, a retrieval model should focus on the text's most semantically similar video sub-regions to make a more relevant comparison. Yet, most existing works aggregate entire videos without directly considering text. Common text-agnostic aggregations schemes include mean-pooling or self-attention over the frames, but these are likely to encode misleading visual information not described in the given text. To address this, we propose a cross-modal attention model called X-Pool that reasons between a text and the frames of a video. Our core mechanism is a scaled dot product attention for a text to attend to its most semantically similar frames. We then generate an aggregated video representation conditioned on the text's attention weights over the frames. We evaluate our method on three benchmark datasets of MSR-VTT, MSVD and LSMDC, achieving new state-of-the-art results by up to 12% in relative improvement in Recall@1. Our findings thereby highlight the importance of joint text-video reasoning to extract important visual cues according to text. Full code and demo can be found at: https://layer6ai-labs.github.io/xpool/

* CVPR 2022

Via

Access Paper or Ask Questions

Weakly Supervised Action Selection Learning in Video

May 06, 2021

Junwei Ma, Satya Krishna Gorti, Maksims Volkovs, Guangwei Yu

Figure 1 for Weakly Supervised Action Selection Learning in Video

Figure 2 for Weakly Supervised Action Selection Learning in Video

Figure 3 for Weakly Supervised Action Selection Learning in Video

Figure 4 for Weakly Supervised Action Selection Learning in Video

Abstract:Localizing actions in video is a core task in computer vision. The weakly supervised temporal localization problem investigates whether this task can be adequately solved with only video-level labels, significantly reducing the amount of expensive and error-prone annotation that is required. A common approach is to train a frame-level classifier where frames with the highest class probability are selected to make a video-level prediction. Frame level activations are then used for localization. However, the absence of frame-level annotations cause the classifier to impart class bias on every frame. To address this, we propose the Action Selection Learning (ASL) approach to capture the general concept of action, a property we refer to as "actionness". Under ASL, the model is trained with a novel class-agnostic task to predict which frames will be selected by the classifier. Empirically, we show that ASL outperforms leading baselines on two popular benchmarks THUMOS-14 and ActivityNet-1.2, with 10.3% and 5.7% relative improvement respectively. We further analyze the properties of ASL and demonstrate the importance of actionness. Full code for this work is available here: https://github.com/layer6ai-labs/ASL.

* CVPR 2021

Via

Access Paper or Ask Questions

Cross-Class Relevance Learning for Temporal Concept Localization

Nov 19, 2019

Junwei Ma, Satya Krishna Gorti, Maksims Volkovs, Ilya Stanevich, Guangwei Yu

Figure 1 for Cross-Class Relevance Learning for Temporal Concept Localization

Figure 2 for Cross-Class Relevance Learning for Temporal Concept Localization

Figure 3 for Cross-Class Relevance Learning for Temporal Concept Localization

Figure 4 for Cross-Class Relevance Learning for Temporal Concept Localization

Abstract:We present a novel Cross-Class Relevance Learning approach for the task of temporal concept localization. Most localization architectures rely on feature extraction layers followed by a classification layer which outputs class probabilities for each segment. However, in many real-world applications classes can exhibit complex relationships that are difficult to model with this architecture. In contrast, we propose to incorporate target class and class-related features as input, and learn a pairwise binary model to predict general segment to class relevance. This facilitates learning of shared information between classes, and allows for arbitrary class-specific feature engineering. We apply this approach to the 3rd YouTube-8M Video Understanding Challenge together with other leading models, and achieve first place out of over 280 teams. In this paper we describe our approach and show some empirical results.

Via

Access Paper or Ask Questions