Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huan He

N3C Natural Language Processing

CDEMapper: Enhancing NIH Common Data Element Normalization using Large Language Models

Nov 30, 2024

Yan Wang, Jimin Huang, Huan He, Vincent Zhang, Yujia Zhou, Xubing Hao, Pritham Ram, Lingfei Qian, Qianqian Xie, Ruey-Ling Weng(+6 more)

Abstract:Common Data Elements (CDEs) standardize data collection and sharing across studies, enhancing data interoperability and improving research reproducibility. However, implementing CDEs presents challenges due to the broad range and variety of data elements. This study aims to develop an effective and efficient mapping tool to bridge the gap between local data elements and National Institutes of Health (NIH) CDEs. We propose CDEMapper, a large language model (LLM) powered mapping tool designed to assist in mapping local data elements to NIH CDEs. CDEMapper has three core modules: (1) CDE indexing and embeddings. NIH CDEs were indexed and embedded to support semantic search; (2) CDE recommendations. The tool combines Elasticsearch (BM25 similarity methods) with state of the art GPT services to recommend candidate CDEs and their permissible values; and (3) Human review. Users review and select the NIH CDEs and values that best match their data elements and value sets. We evaluate the tool recommendation accuracy against manually annotated mapping results. CDEMapper offers a publicly available, LLM-powered, and intuitive user interface that consolidates essential and advanced mapping services into a streamlined pipeline. It provides a step by step, quality assured mapping workflow designed with a user-centered approach. The evaluation results demonstrated that augmenting BM25 with GPT embeddings and a ranker consistently enhances CDEMapper mapping accuracy in three different mapping settings across four evaluation datasets. This work opens up the potential of using LLMs to assist with CDE recommendation and human curation when aligning local data elements with NIH CDEs. Additionally, this effort enhances clinical research data interoperability and helps researchers better understand the gaps between local data elements and NIH CDEs.

* 11 pages,4 figures

Via

Access Paper or Ask Questions

MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Jun 11, 2024

Lu Li, Tianyu Zhang, Zhiqi Bu, Suyuchen Wang, Huan He, Jie Fu, Yonghui Wu, Jiang Bian, Yong Chen, Yoshua Bengio

Figure 1 for MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Figure 2 for MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Figure 3 for MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Figure 4 for MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Abstract:Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during model merging. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP identifies a Pareto set of scaling coefficients for merging multiple models to reflect the trade-offs. The core component of MAP is approximating the evaluation metrics of the various tasks using a quadratic approximation surrogate model derived from a pre-selected set of scaling coefficients, enabling amortized inference. Experimental results on vision and natural language processing tasks show that MAP can accurately identify the Pareto front. To further reduce the required computation of MAP, we propose (1) a Bayesian adaptive sampling algorithm and (2) a nested merging scheme with multiple stages.

Via

Access Paper or Ask Questions

Me LLaMA: Foundation Large Language Models for Medical Applications

Feb 20, 2024

Qianqian Xie, Qingyu Chen, Aokun Chen, Cheng Peng, Yan Hu, Fongci Lin, Xueqing Peng, Jimin Huang, Jeffrey Zhang, Vipina Keloth(+5 more)

Figure 1 for Me LLaMA: Foundation Large Language Models for Medical Applications

Figure 2 for Me LLaMA: Foundation Large Language Models for Medical Applications

Figure 3 for Me LLaMA: Foundation Large Language Models for Medical Applications

Figure 4 for Me LLaMA: Foundation Large Language Models for Medical Applications

Abstract:Recent large language models (LLMs) like ChatGPT and LLaMA have shown great promise in many AI applications. However, their performance on medical tasks is suboptimal and can be further improved by training on large domain-specific datasets. This study introduces Me LLaMA, a medical LLM family including foundation models - Me LLaMA 13/70B and their chat-enhanced versions - Me LLaMA 13/70B-chat, developed through the continual pre-training and instruction tuning of LLaMA2 using large medical data. Our domain-specific data suite for training and evaluation, includes a large-scale continual pre-training dataset with 129B tokens, an instruction tuning dataset with 214k samples, and a medical evaluation benchmark (MIBE) across six tasks with 14 datasets. Our extensive evaluation using MIBE shows that Me LLaMA models surpass existing open-source medical LLMs in zero-shot and few-shot learning and outperform commercial giants like ChatGPT on 6 out of 8 datasets and GPT-4 in 3 out of 8 datasets. In addition, we empirically investigated the catastrophic forgetting problem, and our results show that Me LLaMA models outperform other medical LLMs. Me LLaMA is one of the first and largest open-source foundational LLMs designed for the medical domain, using both biomedical and clinical data. It exhibits superior performance across both general and medical tasks compared to other medical LLMs, rendering it an attractive choice for medical AI applications. All resources are available at: https://github.com/BIDS-Xu-Lab/Me-LLaMA.

* 18 pages, 5 figures, 7 tables

Via

Access Paper or Ask Questions

Reducing operator complexity in Algebraic Multigrid with Machine Learning Approaches

Jul 15, 2023

Ru Huang, Kai Chang, Huan He, Ruipeng Li, Yuanzhe Xi

Figure 1 for Reducing operator complexity in Algebraic Multigrid with Machine Learning Approaches

Figure 2 for Reducing operator complexity in Algebraic Multigrid with Machine Learning Approaches

Figure 3 for Reducing operator complexity in Algebraic Multigrid with Machine Learning Approaches

Figure 4 for Reducing operator complexity in Algebraic Multigrid with Machine Learning Approaches

Abstract:We propose a data-driven and machine-learning-based approach to compute non-Galerkin coarse-grid operators in algebraic multigrid (AMG) methods, addressing the well-known issue of increasing operator complexity. Guided by the AMG theory on spectrally equivalent coarse-grid operators, we have developed novel ML algorithms that utilize neural networks (NNs) combined with smooth test vectors from multigrid eigenvalue problems. The proposed method demonstrates promise in reducing the complexity of coarse-grid operators while maintaining overall AMG convergence for solving parametric partial differential equation (PDE) problems. Numerical experiments on anisotropic rotated Laplacian and linear elasticity problems are provided to showcase the performance and compare with existing methods for computing non-Galerkin coarse-grid operators.

* Sparse Operator, Attention, PDE

Via

Access Paper or Ask Questions

Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency

Jun 03, 2023

Owen Queen, Thomas Hartvigsen, Teddy Koker, Huan He, Theodoros Tsiligkaridis, Marinka Zitnik

Abstract:Interpreting time series models is uniquely challenging because it requires identifying both the location of time series signals that drive model predictions and their matching to an interpretable temporal pattern. While explainers from other modalities can be applied to time series, their inductive biases do not transfer well to the inherently uninterpretable nature of time series. We present TimeX, a time series consistency model for training explainers. TimeX trains an interpretable surrogate to mimic the behavior of a pretrained time series model. It addresses the issue of model faithfulness by introducing model behavior consistency, a novel formulation that preserves relations in the latent space induced by the pretrained model with relations in the latent space induced by TimeX. TimeX provides discrete attribution maps and, unlike existing interpretability methods, it learns a latent space of explanations that can be used in various ways, such as to provide landmarks to visually aggregate similar explanations and easily recognize temporal patterns. We evaluate TimeX on 8 synthetic and real-world datasets and compare its performance against state-of-the-art interpretability methods. We also conduct case studies using physiological time series. Quantitative evaluations demonstrate that TimeX achieves the highest or second-highest performance in every metric compared to baselines across all datasets. Through case studies, we show that the novel components of TimeX show potential for training faithful, interpretable models that capture the behavior of pretrained time series models.

Via

Access Paper or Ask Questions

GNNDelete: A General Strategy for Unlearning in Graph Neural Networks

Feb 26, 2023

Jiali Cheng, George Dasoulas, Huan He, Chirag Agarwal, Marinka Zitnik

Abstract:Graph unlearning, which involves deleting graph elements such as nodes, node labels, and relationships from a trained graph neural network (GNN) model, is crucial for real-world applications where data elements may become irrelevant, inaccurate, or privacy-sensitive. However, existing methods for graph unlearning either deteriorate model weights shared across all nodes or fail to effectively delete edges due to their strong dependence on local graph neighborhoods. To address these limitations, we introduce GNNDelete, a novel model-agnostic layer-wise operator that optimizes two critical properties, namely, Deleted Edge Consistency and Neighborhood Influence, for graph unlearning. Deleted Edge Consistency ensures that the influence of deleted elements is removed from both model weights and neighboring representations, while Neighborhood Influence guarantees that the remaining model knowledge is preserved after deletion. GNNDelete updates representations to delete nodes and edges from the model while retaining the rest of the learned knowledge. We conduct experiments on seven real-world graphs, showing that GNNDelete outperforms existing approaches by up to 38.8% (AUC) on edge, node, and node feature deletion tasks, and 32.2% on distinguishing deleted edges from non-deleted ones. Additionally, GNNDelete is efficient, taking 12.3x less time and 9.3x less space than retraining GNN from scratch on WordNet18.

* Accepted to ICLR2023

Via

Access Paper or Ask Questions

MedDiff: Generating Electronic Health Records using Accelerated Denoising Diffusion Model

Feb 08, 2023

Huan He, Shifan Zhao, Yuanzhe Xi, Joyce C Ho

Abstract:Due to patient privacy protection concerns, machine learning research in healthcare has been undeniably slower and limited than in other application domains. High-quality, realistic, synthetic electronic health records (EHRs) can be leveraged to accelerate methodological developments for research purposes while mitigating privacy concerns associated with data sharing. The current state-of-the-art model for synthetic EHR generation is generative adversarial networks, which are notoriously difficult to train and can suffer from mode collapse. Denoising Diffusion Probabilistic Models, a class of generative models inspired by statistical thermodynamics, have recently been shown to generate high-quality synthetic samples in certain domains. It is unknown whether these can generalize to generation of large-scale, high-dimensional EHRs. In this paper, we present a novel generative model based on diffusion models that is the first successful application on electronic health records. Our model proposes a mechanism to perform class-conditional sampling to preserve label information. We also introduce a new sampling strategy to accelerate the inference speed. We empirically show that our model outperforms existing state-of-the-art synthetic EHR generation methods.

* 12 pages

Via

Access Paper or Ask Questions

Domain Adaptation for Time Series Under Feature and Label Shifts

Feb 06, 2023

Huan He, Owen Queen, Teddy Koker, Consuelo Cuevas, Theodoros Tsiligkaridis, Marinka Zitnik

Figure 1 for Domain Adaptation for Time Series Under Feature and Label Shifts

Figure 2 for Domain Adaptation for Time Series Under Feature and Label Shifts

Figure 3 for Domain Adaptation for Time Series Under Feature and Label Shifts

Figure 4 for Domain Adaptation for Time Series Under Feature and Label Shifts

Abstract:The transfer of models trained on labeled datasets in a source domain to unlabeled target domains is made possible by unsupervised domain adaptation (UDA). However, when dealing with complex time series models, the transferability becomes challenging due to the dynamic temporal structure that varies between domains, resulting in feature shifts and gaps in the time and frequency representations. Furthermore, tasks in the source and target domains can have vastly different label distributions, making it difficult for UDA to mitigate label shifts and recognize labels that only exist in the target domain. We present RAINCOAT, the first model for both closed-set and universal DA on complex time series. RAINCOAT addresses feature and label shifts by considering both temporal and frequency features, aligning them across domains, and correcting for misalignments to facilitate the detection of private labels. Additionally,RAINCOAT improves transferability by identifying label shifts in target domains. Our experiments with 5 datasets and 13 state-of-the-art UDA methods demonstrate that RAINCOAT can achieve an improvement in performance of up to 16.33%, and can effectively handle both closed-set and universal adaptation.

* 24 pages (13 pages main paper + 11 pages supplementary materials)

Via

Access Paper or Ask Questions

An Efficient Nonlinear Acceleration method that Exploits Symmetry of the Hessian

Oct 22, 2022

Huan He, Shifan Zhao, Ziyuan Tang, Joyce C Ho, Yousef Saad, Yuanzhe Xi

Figure 1 for An Efficient Nonlinear Acceleration method that Exploits Symmetry of the Hessian

Figure 2 for An Efficient Nonlinear Acceleration method that Exploits Symmetry of the Hessian

Figure 3 for An Efficient Nonlinear Acceleration method that Exploits Symmetry of the Hessian

Figure 4 for An Efficient Nonlinear Acceleration method that Exploits Symmetry of the Hessian

Abstract:Nonlinear acceleration methods are powerful techniques to speed up fixed-point iterations. However, many acceleration methods require storing a large number of previous iterates and this can become impractical if computational resources are limited. In this paper, we propose a nonlinear Truncated Generalized Conjugate Residual method (nlTGCR) whose goal is to exploit the symmetry of the Hessian to reduce memory usage. The proposed method can be interpreted as either an inexact Newton or a quasi-Newton method. We show that, with the help of global strategies like residual check techniques, nlTGCR can converge globally for general nonlinear problems and that under mild conditions, nlTGCR is able to achieve superlinear convergence. We further analyze the convergence of nlTGCR in a stochastic setting. Numerical results demonstrate the superiority of nlTGCR when compared with several other competitive baseline approaches on a few problems. Our code will be available in the future.

* Optimization, Short-term recurrence method by exploiting Hessian, Numerical Analysis, Iterative Method, Quasi-Newton, Anderson Acceleration, 31 pages

Via

Access Paper or Ask Questions

Domain generalization Person Re-identification on Attention-aware multi-operation strategery

Oct 19, 2022

Yingchun Guo, Huan He, Ye Zhu, Yang Yu

Figure 1 for Domain generalization Person Re-identification on Attention-aware multi-operation strategery

Figure 2 for Domain generalization Person Re-identification on Attention-aware multi-operation strategery

Figure 3 for Domain generalization Person Re-identification on Attention-aware multi-operation strategery

Figure 4 for Domain generalization Person Re-identification on Attention-aware multi-operation strategery

Abstract:Domain generalization person re-identification (DG Re-ID) aims to directly deploy a model trained on the source domain to the unseen target domain with good generalization, which is a challenging problem and has practical value in a real-world deployment. In the existing DG Re-ID methods, invariant operations are effective in extracting domain generalization features, and Instance Normalization (IN) or Batch Normalization (BN) is used to alleviate the bias to unseen domains. Due to domain-specific information being used to capture discriminability of the individual source domain, the generalized ability for unseen domains is unsatisfactory. To address this problem, an Attention-aware Multi-operation Strategery (AMS) for DG Re-ID is proposed to extract more generalized features. We investigate invariant operations and construct a multi-operation module based on IN and group whitening (GW) to extract domain-invariant feature representations. Furthermore, we analyze different domain-invariant characteristics, and apply spatial attention to the IN operation and channel attention to the GW operation to enhance the domain-invariant features. The proposed AMS module can be used as a plug-and-play module to incorporate into existing network architectures. Extensive experimental results show that AMS can effectively enhance the model's generalization ability to unseen domains and significantly improves the recognition performance in DG Re-ID on three protocols with ten datasets.

Via

Access Paper or Ask Questions