Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Kim

ScarNet: A Novel Foundation Model for Automated Myocardial Scar Quantification from LGE in Cardiac MRI

Jan 02, 2025

Neda Tavakoli, Amir Ali Rahsepar, Brandon C. Benefield, Daming Shen, Santiago López-Tapia, Florian Schiffers, Jeffrey J. Goldberger, Christine M. Albert, Edwin Wu, Aggelos K. Katsaggelos(+2 more)

Figure 1 for ScarNet: A Novel Foundation Model for Automated Myocardial Scar Quantification from LGE in Cardiac MRI

Figure 2 for ScarNet: A Novel Foundation Model for Automated Myocardial Scar Quantification from LGE in Cardiac MRI

Figure 3 for ScarNet: A Novel Foundation Model for Automated Myocardial Scar Quantification from LGE in Cardiac MRI

Figure 4 for ScarNet: A Novel Foundation Model for Automated Myocardial Scar Quantification from LGE in Cardiac MRI

Abstract:Background: Late Gadolinium Enhancement (LGE) imaging is the gold standard for assessing myocardial fibrosis and scarring, with left ventricular (LV) LGE extent predicting major adverse cardiac events (MACE). Despite its importance, routine LGE-based LV scar quantification is hindered by labor-intensive manual segmentation and inter-observer variability. Methods: We propose ScarNet, a hybrid model combining a transformer-based encoder from the Medical Segment Anything Model (MedSAM) with a convolution-based U-Net decoder, enhanced by tailored attention blocks. ScarNet was trained on 552 ischemic cardiomyopathy patients with expert segmentations of myocardial and scar boundaries and tested on 184 separate patients. Results: ScarNet achieved robust scar segmentation in 184 test patients, yielding a median Dice score of 0.912 (IQR: 0.863--0.944), significantly outperforming MedSAM (median Dice = 0.046, IQR: 0.043--0.047) and nnU-Net (median Dice = 0.638, IQR: 0.604--0.661). ScarNet demonstrated lower bias (-0.63%) and coefficient of variation (4.3%) compared to MedSAM (bias: -13.31%, CoV: 130.3%) and nnU-Net (bias: -2.46%, CoV: 20.3%). In Monte Carlo simulations with noise perturbations, ScarNet achieved significantly higher scar Dice (0.892 \pm 0.053, CoV = 5.9%) than MedSAM (0.048 \pm 0.112, CoV = 233.3%) and nnU-Net (0.615 \pm 0.537, CoV = 28.7%). Conclusion: ScarNet outperformed MedSAM and nnU-Net in accurately segmenting myocardial and scar boundaries in LGE images. The model exhibited robust performance across diverse image qualities and scar patterns.

* 31 pages, 8 figures

Via

Access Paper or Ask Questions

FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation

Dec 16, 2024

Dannong Wang, Daniel Kim, Bo Jin, Xingjian Zhao, Tianfan Fu, Steve Yang, Xiao-Yang Liu

Figure 1 for FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation

Figure 2 for FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation

Figure 3 for FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation

Figure 4 for FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation

Abstract:Finetuned large language models (LLMs) have shown remarkable performance in financial tasks, such as sentiment analysis and information retrieval. Due to privacy concerns, finetuning and deploying Financial LLMs (FinLLMs) locally are crucial for institutions. However, finetuning FinLLMs poses challenges including GPU memory constraints and long input sequences. In this paper, we employ quantized low-rank adaptation (QLoRA) to finetune FinLLMs, which leverage low-rank matrix decomposition and quantization techniques to significantly reduce computational requirements while maintaining high model performance. We also employ data and pipeline parallelism to enable local finetuning using cost-effective, widely accessible GPUs. Experiments on financial datasets demonstrate that our method achieves substantial improvements in accuracy, GPU memory usage, and time efficiency, underscoring the potential of lowrank methods for scalable and resource-efficient LLM finetuning.

Via

Access Paper or Ask Questions

DRL-STNet: Unsupervised Domain Adaptation for Cross-modality Medical Image Segmentation via Disentangled Representation Learning

Sep 26, 2024

Hui Lin, Florian Schiffers, Santiago López-Tapia, Neda Tavakoli, Daniel Kim, Aggelos K. Katsaggelos

Figure 1 for DRL-STNet: Unsupervised Domain Adaptation for Cross-modality Medical Image Segmentation via Disentangled Representation Learning

Figure 2 for DRL-STNet: Unsupervised Domain Adaptation for Cross-modality Medical Image Segmentation via Disentangled Representation Learning

Figure 3 for DRL-STNet: Unsupervised Domain Adaptation for Cross-modality Medical Image Segmentation via Disentangled Representation Learning

Figure 4 for DRL-STNet: Unsupervised Domain Adaptation for Cross-modality Medical Image Segmentation via Disentangled Representation Learning

Abstract:Unsupervised domain adaptation (UDA) is essential for medical image segmentation, especially in cross-modality data scenarios. UDA aims to transfer knowledge from a labeled source domain to an unlabeled target domain, thereby reducing the dependency on extensive manual annotations. This paper presents DRL-STNet, a novel framework for cross-modality medical image segmentation that leverages generative adversarial networks (GANs), disentangled representation learning (DRL), and self-training (ST). Our method leverages DRL within a GAN to translate images from the source to the target modality. Then, the segmentation model is initially trained with these translated images and corresponding source labels and then fine-tuned iteratively using a combination of synthetic and real images with pseudo-labels and real labels. The proposed framework exhibits superior performance in abdominal organ segmentation on the FLARE challenge dataset, surpassing state-of-the-art methods by 11.4% in the Dice similarity coefficient and by 13.1% in the Normalized Surface Dice metric, achieving scores of 74.21% and 80.69%, respectively. The average running time is 41 seconds, and the area under the GPU memory-time curve is 11,292 MB. These results indicate the potential of DRL-STNet for enhancing cross-modality medical image segmentation tasks.

* MICCAI 2024 Challenge, FLARE Challenge, Unsupervised domain adaptation, Organ segmentation, Feature disentanglement, Self-training

Via

Access Paper or Ask Questions

Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Aug 20, 2024

Qianqian Xie, Dong Li, Mengxi Xiao, Zihao Jiang, Ruoyu Xiang, Xiao Zhang, Zhengyu Chen, Yueru He, Weiguang Han, Yuzhe Yang(+29 more)

Figure 1 for Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Figure 2 for Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Figure 3 for Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Figure 4 for Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Abstract:Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. To address these limitations, we introduce \textit{Open-FinLLMs}, a series of Financial LLMs. We begin with FinLLaMA, pre-trained on a 52 billion token financial corpus, incorporating text, tables, and time-series data to embed comprehensive financial knowledge. FinLLaMA is then instruction fine-tuned with 573K financial instructions, resulting in FinLLaMA-instruct, which enhances task performance. Finally, we present FinLLaVA, a multimodal LLM trained with 1.43M image-text instructions to handle complex financial data types. Extensive evaluations demonstrate FinLLaMA's superior performance over LLaMA3-8B, LLaMA3.1-8B, and BloombergGPT in both zero-shot and few-shot settings across 19 and 4 datasets, respectively. FinLLaMA-instruct outperforms GPT-4 and other Financial LLMs on 15 datasets. FinLLaVA excels in understanding tables and charts across 4 multimodal tasks. Additionally, FinLLaMA achieves impressive Sharpe Ratios in trading simulations, highlighting its robust financial application capabilities. We will continually maintain and improve our models and benchmarks to support ongoing innovation in academia and industry.

* 33 pages, 13 figures

Via

Access Paper or Ask Questions

Pegasus-v1 Technical Report

Apr 23, 2024

Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim(+34 more)

Figure 1 for Pegasus-v1 Technical Report

Figure 2 for Pegasus-v1 Technical Report

Figure 3 for Pegasus-v1 Technical Report

Figure 4 for Pegasus-v1 Technical Report

Abstract:This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction.

Via

Access Paper or Ask Questions

2-Step Sparse-View CT Reconstruction with a Domain-Specific Perceptual Network

Dec 08, 2020

Haoyu Wei, Florian Schiffers, Tobias Würfl, Daming Shen, Daniel Kim, Aggelos K. Katsaggelos, Oliver Cossairt

Figure 1 for 2-Step Sparse-View CT Reconstruction with a Domain-Specific Perceptual Network

Figure 2 for 2-Step Sparse-View CT Reconstruction with a Domain-Specific Perceptual Network

Figure 3 for 2-Step Sparse-View CT Reconstruction with a Domain-Specific Perceptual Network

Figure 4 for 2-Step Sparse-View CT Reconstruction with a Domain-Specific Perceptual Network

Abstract:Computed tomography is widely used to examine internal structures in a non-destructive manner. To obtain high-quality reconstructions, one typically has to acquire a densely sampled trajectory to avoid angular undersampling. However, many scenarios require a sparse-view measurement leading to streak-artifacts if unaccounted for. Current methods do not make full use of the domain-specific information, and hence fail to provide reliable reconstructions for highly undersampled data. We present a novel framework for sparse-view tomography by decoupling the reconstruction into two steps: First, we overcome its ill-posedness using a super-resolution network, SIN, trained on the sparse projections. The intermediate result allows for a closed-form tomographic reconstruction with preserved details and highly reduced streak-artifacts. Second, a refinement network, PRN, trained on the reconstructions reduces any remaining artifacts. We further propose a light-weight variant of the perceptual-loss that enhances domain-specific information, boosting restoration accuracy. Our experiments demonstrate an improvement over current solutions by 4 dB.

Via

Access Paper or Ask Questions

Historic Emergence of Diversity in Painting: Heterogeneity in Chromatic Distance in Images and Characterization of Massive Painting Data Set

Sep 14, 2018

Byunghwee Lee, Daniel Kim, Seunghye Sun, Hawoong Jeong, Juyong Park

Figure 1 for Historic Emergence of Diversity in Painting: Heterogeneity in Chromatic Distance in Images and Characterization of Massive Painting Data Set

Figure 2 for Historic Emergence of Diversity in Painting: Heterogeneity in Chromatic Distance in Images and Characterization of Massive Painting Data Set

Figure 3 for Historic Emergence of Diversity in Painting: Heterogeneity in Chromatic Distance in Images and Characterization of Massive Painting Data Set

Figure 4 for Historic Emergence of Diversity in Painting: Heterogeneity in Chromatic Distance in Images and Characterization of Massive Painting Data Set

Abstract:Painting is an art form that has long functioned as a major channel for the creative expression and communication of humans, its evolution taking place under an interplay with the science, technology, and social environments of the times. Therefore, understanding the process based on comprehensive data could shed light on how humans acted and manifested creatively under changing conditions. Yet, there exist few systematic frameworks that characterize the process for painting, which would require robust statistical methods for defining painting characteristics and identifying human's creative developments, and data of high quality and sufficient quantity. Here we propose that the color contrast of a painting image signifying the heterogeneity in inter-pixel chromatic distance can be a useful representation of its style, integrating both the color and geometry. From the color contrasts of paintings from a large-scale, comprehensive archive of 179,853 high-quality images spanning several centuries we characterize the temporal evolutionary patterns of paintings, and present a deep study of an extraordinary expansion in creative diversity and individuality that came to define the modern era.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions