Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

John Pauly

LTDA-Drive: LLMs-guided Generative Models based Long-tail Data Augmentation for Autonomous Driving

May 21, 2025

Mahmut Yurt, Xin Ye, Yunsheng Ma, Jingru Luo, Abhirup Mallik, John Pauly, Burhaneddin Yaman, Liu Ren

Abstract:3D perception plays an essential role for improving the safety and performance of autonomous driving. Yet, existing models trained on real-world datasets, which naturally exhibit long-tail distributions, tend to underperform on rare and safety-critical, vulnerable classes, such as pedestrians and cyclists. Existing studies on reweighting and resampling techniques struggle with the scarcity and limited diversity within tail classes. To address these limitations, we introduce LTDA-Drive, a novel LLM-guided data augmentation framework designed to synthesize diverse, high-quality long-tail samples. LTDA-Drive replaces head-class objects in driving scenes with tail-class objects through a three-stage process: (1) text-guided diffusion models remove head-class objects, (2) generative models insert instances of the tail classes, and (3) an LLM agent filters out low-quality synthesized images. Experiments conducted on the KITTI dataset show that LTDA-Drive significantly improves tail-class detection, achieving 34.75\% improvement for rare classes over counterpart methods. These results further highlight the effectiveness of LTDA-Drive in tackling long-tail challenges by generating high-quality and diverse data.

Via

Access Paper or Ask Questions

Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts

Sep 14, 2023

Dave Van Veen, Cara Van Uden, Louis Blankemeier, Jean-Benoit Delbrouck, Asad Aali, Christian Bluethgen, Anuj Pareek, Malgorzata Polacin, William Collins, Neera Ahuja(+5 more)

Abstract:Sifting through vast textual data and summarizing key information imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown immense promise in natural language processing (NLP) tasks, their efficacy across diverse clinical summarization tasks has not yet been rigorously examined. In this work, we employ domain adaptation methods on eight LLMs, spanning six datasets and four distinct summarization tasks: radiology reports, patient questions, progress notes, and doctor-patient dialogue. Our thorough quantitative assessment reveals trade-offs between models and adaptation methods in addition to instances where recent advances in LLMs may not lead to improved results. Further, in a clinical reader study with six physicians, we depict that summaries from the best adapted LLM are preferable to human summaries in terms of completeness and correctness. Our ensuing qualitative analysis delineates mutual challenges faced by both LLMs and human experts. Lastly, we correlate traditional quantitative NLP metrics with reader study scores to enhance our understanding of how these metrics align with physician preferences. Our research marks the first evidence of LLMs outperforming human experts in clinical text summarization across multiple tasks. This implies that integrating LLMs into clinical workflows could alleviate documentation burden, empowering clinicians to focus more on personalized patient care and other irreplaceable human aspects of medicine.

* 23 pages, 22 figures

Via

Access Paper or Ask Questions

RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models

May 02, 2023

Dave Van Veen, Cara Van Uden, Maayane Attias, Anuj Pareek, Christian Bluethgen, Malgorzata Polacin, Wah Chiu, Jean-Benoit Delbrouck, Juan Manuel Zambrano Chaves, Curtis P. Langlotz(+2 more)

Abstract:We systematically investigate lightweight strategies to adapt large language models (LLMs) for the task of radiology report summarization (RRS). Specifically, we focus on domain adaptation via pretraining (on natural language, biomedical text, and clinical text) and via prompting (zero-shot, in-context learning) or parameter-efficient fine-tuning (prefix tuning, LoRA). Our results on the MIMIC-III dataset consistently demonstrate best performance by maximally adapting to the task via pretraining on clinical text and parameter-efficient fine-tuning on RRS examples. Importantly, this method fine-tunes a mere 0.32% of parameters throughout the model, in contrast to end-to-end fine-tuning (100% of parameters). Additionally, we study the effect of in-context examples and out-of-distribution (OOD) training before concluding with a radiologist reader study and qualitative analysis. Our findings highlight the importance of domain adaptation in RRS and provide valuable insights toward developing effective natural language processing solutions for clinical tasks.

* 12 pages, 9 figures

Via

Access Paper or Ask Questions

Scale-Agnostic Super-Resolution in MRI using Feature-Based Coordinate Networks

Oct 18, 2022

Dave Van Veen, Rogier van der Sluijs, Batu Ozturkler, Arjun Desai, Christian Bluethgen, Robert D. Boutin, Marc H. Willis, Gordon Wetzstein, David Lindell, Shreyas Vasanawala(+2 more)

Figure 1 for Scale-Agnostic Super-Resolution in MRI using Feature-Based Coordinate Networks

Figure 2 for Scale-Agnostic Super-Resolution in MRI using Feature-Based Coordinate Networks

Figure 3 for Scale-Agnostic Super-Resolution in MRI using Feature-Based Coordinate Networks

Figure 4 for Scale-Agnostic Super-Resolution in MRI using Feature-Based Coordinate Networks

Abstract:We propose using a coordinate network decoder for the task of super-resolution in MRI. The continuous signal representation of coordinate networks enables this approach to be scale-agnostic, i.e. one can train over a continuous range of scales and subsequently query at arbitrary resolutions. Due to the difficulty of performing super-resolution on inherently noisy data, we analyze network behavior under multiple denoising strategies. Lastly we compare this method to a standard convolutional decoder using both quantitative metrics and a radiologist study implemented in Voxel, our newly developed tool for web-based evaluation of medical images.

* Medical Imaging with Deep Learning. 2022

Via

Access Paper or Ask Questions

Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

May 20, 2022

Arda Sahiner, Tolga Ergen, Batu Ozturkler, John Pauly, Morteza Mardani, Mert Pilanci

Figure 1 for Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

Figure 2 for Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

Figure 3 for Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

Figure 4 for Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

Abstract:Vision transformers using self-attention or its proposed alternatives have demonstrated promising results in many image related tasks. However, the underpinning inductive bias of attention is not well understood. To address this issue, this paper analyzes attention through the lens of convex duality. For the non-linear dot-product self-attention, and alternative mechanisms such as MLP-mixer and Fourier Neural Operator (FNO), we derive equivalent finite-dimensional convex problems that are interpretable and solvable to global optimality. The convex programs lead to {\it block nuclear-norm regularization} that promotes low rank in the latent feature and token dimensions. In particular, we show how self-attention networks implicitly clusters the tokens, based on their latent similarity. We conduct experiments for transferring a pre-trained transformer backbone for CIFAR-100 classification by fine-tuning a variety of convex attention heads. The results indicate the merits of the bias induced by attention compared with the existing MLP or linear heads.

* 38 pages, 2 figures. To appear in ICML 2022

Via

Access Paper or Ask Questions

Scale-Equivariant Unrolled Neural Networks for Data-Efficient Accelerated MRI Reconstruction

Apr 21, 2022

Beliz Gunel, Arda Sahiner, Arjun D. Desai, Akshay S. Chaudhari, Shreyas Vasanawala, Mert Pilanci, John Pauly

Figure 1 for Scale-Equivariant Unrolled Neural Networks for Data-Efficient Accelerated MRI Reconstruction

Figure 2 for Scale-Equivariant Unrolled Neural Networks for Data-Efficient Accelerated MRI Reconstruction

Figure 3 for Scale-Equivariant Unrolled Neural Networks for Data-Efficient Accelerated MRI Reconstruction

Figure 4 for Scale-Equivariant Unrolled Neural Networks for Data-Efficient Accelerated MRI Reconstruction

Abstract:Unrolled neural networks have enabled state-of-the-art reconstruction performance and fast inference times for the accelerated magnetic resonance imaging (MRI) reconstruction task. However, these approaches depend on fully-sampled scans as ground truth data which is either costly or not possible to acquire in many clinical medical imaging applications; hence, reducing dependence on data is desirable. In this work, we propose modeling the proximal operators of unrolled neural networks with scale-equivariant convolutional neural networks in order to improve the data-efficiency and robustness to drifts in scale of the images that might stem from the variability of patient anatomies or change in field-of-view across different MRI scanners. Our approach demonstrates strong improvements over the state-of-the-art unrolled neural networks under the same memory constraints both with and without data augmentations on both in-distribution and out-of-distribution scaled images without significantly increasing the train or inference time.

Via

Access Paper or Ask Questions

NeRP: Implicit Neural Representation Learning with Prior Embedding for Sparsely Sampled Image Reconstruction

Aug 24, 2021

Liyue Shen, John Pauly, Lei Xing

Figure 1 for NeRP: Implicit Neural Representation Learning with Prior Embedding for Sparsely Sampled Image Reconstruction

Figure 2 for NeRP: Implicit Neural Representation Learning with Prior Embedding for Sparsely Sampled Image Reconstruction

Figure 3 for NeRP: Implicit Neural Representation Learning with Prior Embedding for Sparsely Sampled Image Reconstruction

Figure 4 for NeRP: Implicit Neural Representation Learning with Prior Embedding for Sparsely Sampled Image Reconstruction

Abstract:Image reconstruction is an inverse problem that solves for a computational image based on sampled sensor measurement. Sparsely sampled image reconstruction poses addition challenges due to limited measurements. In this work, we propose an implicit Neural Representation learning methodology with Prior embedding (NeRP) to reconstruct a computational image from sparsely sampled measurements. The method differs fundamentally from previous deep learning-based image reconstruction approaches in that NeRP exploits the internal information in an image prior, and the physics of the sparsely sampled measurements to produce a representation of the unknown subject. No large-scale data is required to train the NeRP except for a prior image and sparsely sampled measurements. In addition, we demonstrate that NeRP is a general methodology that generalizes to different imaging modalities such as CT and MRI. We also show that NeRP can robustly capture the subtle yet significant image changes required for assessing tumor progression.

Via

Access Paper or Ask Questions

Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions

Jul 12, 2021

Arda Sahiner, Tolga Ergen, Batu Ozturkler, Burak Bartan, John Pauly, Morteza Mardani, Mert Pilanci

Figure 1 for Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions

Figure 2 for Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions

Figure 3 for Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions

Figure 4 for Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions

Abstract:Generative Adversarial Networks (GANs) are commonly used for modeling complex distributions of data. Both the generators and discriminators of GANs are often modeled by neural networks, posing a non-transparent optimization problem which is non-convex and non-concave over the generator and discriminator, respectively. Such networks are often heuristically optimized with gradient descent-ascent (GDA), but it is unclear whether the optimization problem contains any saddle points, or whether heuristic methods can find them in practice. In this work, we analyze the training of Wasserstein GANs with two-layer neural network discriminators through the lens of convex duality, and for a variety of generators expose the conditions under which Wasserstein GANs can be solved exactly with convex optimization approaches, or can be represented as convex-concave games. Using this convex duality interpretation, we further demonstrate the impact of different activation functions of the discriminator. Our observations are verified with numerical results demonstrating the power of the convex interpretation, with applications in progressive training of convex architectures corresponding to linear generators and quadratic-activation discriminators for CelebA image generation. The code for our experiments is available at https://github.com/ardasahiner/ProCoGAN.

* First two authors contributed equally to this work; 30 pages, 11 figures

Via

Access Paper or Ask Questions

A Geometry-Informed Deep Learning Framework for Ultra-Sparse 3D Tomographic Image Reconstruction

May 25, 2021

Liyue Shen, Wei Zhao, Dante Capaldi, John Pauly, Lei Xing

Figure 1 for A Geometry-Informed Deep Learning Framework for Ultra-Sparse 3D Tomographic Image Reconstruction

Figure 2 for A Geometry-Informed Deep Learning Framework for Ultra-Sparse 3D Tomographic Image Reconstruction

Figure 3 for A Geometry-Informed Deep Learning Framework for Ultra-Sparse 3D Tomographic Image Reconstruction

Figure 4 for A Geometry-Informed Deep Learning Framework for Ultra-Sparse 3D Tomographic Image Reconstruction

Abstract:Deep learning affords enormous opportunities to augment the armamentarium of biomedical imaging, albeit its design and implementation have potential flaws. Fundamentally, most deep learning models are driven entirely by data without consideration of any prior knowledge, which dramatically increases the complexity of neural networks and limits the application scope and model generalizability. Here we establish a geometry-informed deep learning framework for ultra-sparse 3D tomographic image reconstruction. We introduce a novel mechanism for integrating geometric priors of the imaging system. We demonstrate that the seamless inclusion of known priors is essential to enhance the performance of 3D volumetric computed tomography imaging with ultra-sparse sampling. The study opens new avenues for data-driven biomedical imaging and promises to provide substantially improved imaging tools for various clinical imaging and image-guided interventions.

Via

Access Paper or Ask Questions

OUTCOMES: Rapid Under-sampling Optimization achieves up to 50% improvements in reconstruction accuracy for multi-contrast MRI sequences

Mar 08, 2021

Ke Wang, Enhao Gong, Yuxin Zhang, Suchadrima Banerjee, Greg Zaharchuk, John Pauly

Figure 1 for OUTCOMES: Rapid Under-sampling Optimization achieves up to 50% improvements in reconstruction accuracy for multi-contrast MRI sequences

Figure 2 for OUTCOMES: Rapid Under-sampling Optimization achieves up to 50% improvements in reconstruction accuracy for multi-contrast MRI sequences

Figure 3 for OUTCOMES: Rapid Under-sampling Optimization achieves up to 50% improvements in reconstruction accuracy for multi-contrast MRI sequences

Figure 4 for OUTCOMES: Rapid Under-sampling Optimization achieves up to 50% improvements in reconstruction accuracy for multi-contrast MRI sequences

Abstract:Multi-contrast Magnetic Resonance Imaging (MRI) acquisitions from a single scan have tremendous potential to streamline exams and reduce imaging time. However, maintaining clinically feasible scan time necessitates significant undersampling, pushing the limits on compressed sensing and other low-dimensional techniques. During MRI scanning, one of the possible solutions is by using undersampling designs which can effectively improve the acquisition and achieve higher reconstruction accuracy. However, existing undersampling optimization methods are time-consuming and the limited performance prevents their clinical applications. In this paper, we proposed an improved undersampling trajectory optimization scheme to generate an optimized trajectory within seconds and apply it to subsequent multi-contrast MRI datasets on a per-subject basis, where we named it OUTCOMES. By using a data-driven method combined with improved algorithm design, GPU acceleration, and more efficient computation, the proposed method can optimize a trajectory within 5-10 seconds and achieve 30%-50% reconstruction improvement with the same acquisition cost, which makes real-time under-sampling optimization possible for clinical applications.

* 12 pages, 5 figures

Via

Access Paper or Ask Questions