Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yiming Xiao

DamageCAT: A Deep Learning Transformer Framework for Typology-Based Post-Disaster Building Damage Categorization

Apr 15, 2025

Yiming Xiao, Ali Mostafavi

Abstract:Natural disasters increasingly threaten communities worldwide, creating an urgent need for rapid, reliable building damage assessment to guide emergency response and recovery efforts. Current methods typically classify damage in binary (damaged/undamaged) or ordinal severity terms, limiting their practical utility. In fact, the determination of damage typology is crucial for response and recovery efforts. To address this important gap, this paper introduces DamageCAT, a novel framework that provides typology-based categorical damage descriptions rather than simple severity ratings. Accordingly, this study presents two key contributions: (1) the BD-TypoSAT dataset containing satellite image triplets (pre-disaster, post-disaster, and damage masks) from Hurricane Ida with four damage categories (partial roof damage, total roof damage, partial structural collapse, and total structural collapse), and (2) a hierarchical U-Net-based transformer architecture that effectively processes pre-post disaster image pairs to identify and categorize building damage. Despite significant class imbalances in the training data, our model achieved robust performance with overall metrics of 0.7921 Intersection over Union (IoU) and 0.8835 F1 scores across all categories. The model's capability to recognize intricate damage typology in less common categories is especially remarkable. The DamageCAT framework advances automated damage assessment by providing actionable, typological information that better supports disaster response decision-making and resource allocation compared to traditional severity-based approaches.

* 23 pages, 6 figures

Via

Access Paper or Ask Questions

Medical Image Classification with KAN-Integrated Transformers and Dilated Neighborhood Attention

Feb 19, 2025

Omid Nejati Manzari, Hojat Asgariandehkordi, Taha Koleilat, Yiming Xiao, Hassan Rivaz

Abstract:Convolutional networks, transformers, hybrid models, and Mamba-based architectures have demonstrated strong performance across various medical image classification tasks. However, these methods were primarily designed to classify clean images using labeled data. In contrast, real-world clinical data often involve image corruptions that are unique to multi-center studies and stem from variations in imaging equipment across manufacturers. In this paper, we introduce the Medical Vision Transformer (MedViTV2), a novel architecture incorporating Kolmogorov-Arnold Network (KAN) layers into the transformer architecture for the first time, aiming for generalized medical image classification. We have developed an efficient KAN block to reduce computational load while enhancing the accuracy of the original MedViT. Additionally, to counteract the fragility of our MedViT when scaled up, we propose an enhanced Dilated Neighborhood Attention (DiNA), an adaptation of the efficient fused dot-product attention kernel capable of capturing global context and expanding receptive fields to scale the model effectively and addressing feature collapse issues. Moreover, a hierarchical hybrid strategy is introduced to stack our Local Feature Perception and Global Feature Perception blocks in an efficient manner, which balances local and global feature perceptions to boost performance. Extensive experiments on 17 medical image classification datasets and 12 corrupted medical image datasets demonstrate that MedViTV2 achieved state-of-the-art results in 27 out of 29 experiments with reduced computational complexity. MedViTV2 is 44\% more computationally efficient than the previous version and significantly enhances accuracy, achieving improvements of 4.6\% on MedMNIST, 5.8\% on NonMNIST, and 13.4\% on the MedMNIST-C benchmark.

Via

Access Paper or Ask Questions

Reliability of deep learning models for anatomical landmark detection: The role of inter-rater variability

Nov 26, 2024

Soorena Salari, Hassan Rivaz, Yiming Xiao

Figure 1 for Reliability of deep learning models for anatomical landmark detection: The role of inter-rater variability

Figure 2 for Reliability of deep learning models for anatomical landmark detection: The role of inter-rater variability

Figure 3 for Reliability of deep learning models for anatomical landmark detection: The role of inter-rater variability

Figure 4 for Reliability of deep learning models for anatomical landmark detection: The role of inter-rater variability

Abstract:Automated detection of anatomical landmarks plays a crucial role in many diagnostic and surgical applications. Progresses in deep learning (DL) methods have resulted in significant performance enhancement in tasks related to anatomical landmark detection. While current research focuses on accurately localizing these landmarks in medical scans, the importance of inter-rater annotation variability in building DL models is often overlooked. Understanding how inter-rater variability impacts the performance and reliability of the resulting DL algorithms, which are crucial for clinical deployment, can inform the improvement of training data construction and boost DL models' outcomes. In this paper, we conducted a thorough study of different annotation-fusion strategies to preserve inter-rater variability in DL models for anatomical landmark detection, aiming to boost the performance and reliability of the resulting algorithms. Additionally, we explored the characteristics and reliability of four metrics, including a novel Weighted Coordinate Variance metric to quantify landmark detection uncertainty/inter-rater variability. Our research highlights the crucial connection between inter-rater variability, DL-models performances, and uncertainty, revealing how different approaches for multi-rater landmark annotation fusion can influence these factors.

* Accepted to SPIE Medical Imaging 2025

Via

Access Paper or Ask Questions

CAMLD: Contrast-Agnostic Medical Landmark Detection with Consistency-Based Regularization

Nov 26, 2024

Soorena Salari, Arash Harirpoush, Hassan Rivaz, Yiming Xiao

Abstract:Anatomical landmark detection in medical images is essential for various clinical and research applications, including disease diagnosis and surgical planning. However, manual landmark annotation is time-consuming and requires significant expertise. Existing deep learning (DL) methods often require large amounts of well-annotated data, which are costly to acquire. In this paper, we introduce CAMLD, a novel self-supervised DL framework for anatomical landmark detection in unlabeled scans with varying contrasts by using only a single reference example. To achieve this, we employed an inter-subject landmark consistency loss with an image registration loss while introducing a 3D convolution-based contrast augmentation strategy to promote model generalization to new contrasts. Additionally, we utilize an adaptive mixed loss function to schedule the contributions of different sub-tasks for optimal outcomes. We demonstrate the proposed method with the intricate task of MRI-based 3D brain landmark detection. With comprehensive experiments on four diverse clinical and public datasets, including both T1w and T2w MRI scans at different MRI field strengths, we demonstrate that CAMLD outperforms the state-of-the-art methods in terms of mean radial errors (MREs) and success detection rates (SDRs). Our framework provides a robust and accurate solution for anatomical landmark detection, reducing the need for extensively annotated datasets and generalizing well across different imaging contrasts. Our code will be publicly available at: https://github.com/HealthX-Lab/CAMLD.

* 14 pages, 6 figures, 3 tables

Via

Access Paper or Ask Questions

BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

Nov 21, 2024

Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao

Figure 1 for BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

Figure 2 for BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

Figure 3 for BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

Figure 4 for BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

Abstract:Recent advancements in vision-language models (VLMs), such as CLIP, have demonstrated substantial success in self-supervised representation learning for vision tasks. However, effectively adapting VLMs to downstream applications remains challenging, as their accuracy often depends on time-intensive and expertise-demanding prompt engineering, while full model fine-tuning is costly. This is particularly true for biomedical images, which, unlike natural images, typically suffer from limited annotated datasets, unintuitive image contrasts, and nuanced visual features. Recent prompt learning techniques, such as Context Optimization (CoOp) intend to tackle these issues, but still fall short in generalizability. Meanwhile, explorations in prompt learning for biomedical image analysis are still highly limited. In this work, we propose BiomedCoOp, a novel prompt learning framework that enables efficient adaptation of BiomedCLIP for accurate and highly generalizable few-shot biomedical image classification. Our approach achieves effective prompt context learning by leveraging semantic consistency with average prompt ensembles from Large Language Models (LLMs) and knowledge distillation with a statistics-based prompt selection strategy. We conducted comprehensive validation of our proposed framework on 11 medical datasets across 9 modalities and 10 organs against existing state-of-the-art methods, demonstrating significant improvements in both accuracy and generalizability. The code will be publicly available at https://github.com/HealthX-Lab/BiomedCoOp.

* 18 pages, 5 figures, 10 tables

Via

Access Paper or Ask Questions

MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation

Sep 28, 2024

Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao

Figure 1 for MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation

Figure 2 for MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation

Figure 3 for MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation

Figure 4 for MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation

Abstract:Segmentation of anatomical structures and pathological regions in medical images is essential for modern clinical diagnosis, disease research, and treatment planning. While significant advancements have been made in deep learning-based segmentation techniques, many of these methods still suffer from limitations in data efficiency, generalizability, and interactivity. As a result, developing precise segmentation methods that require fewer labeled datasets remains a critical challenge in medical image analysis. Recently, the introduction of foundation models like CLIP and Segment-Anything-Model (SAM), with robust cross-domain representations, has paved the way for interactive and universal image segmentation. However, further exploration of these models for data-efficient segmentation in medical imaging is still needed and highly relevant. In this paper, we introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans using text prompts, in both zero-shot and weakly supervised settings. Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss, and leveraging the Multi-modal Information Bottleneck (M2IB) to create visual prompts for generating segmentation masks from SAM in the zero-shot setting. We also investigate using zero-shot segmentation labels within a weakly supervised paradigm to enhance segmentation quality further. Extensive testing across four diverse segmentation tasks and medical imaging modalities (breast tumor ultrasound, brain tumor MRI, lung X-ray, and lung CT) demonstrates the high accuracy of our proposed framework. Our code is available at https://github.com/HealthX-Lab/MedCLIP-SAMv2.

* 10 pages, 2 figures, 6 tables

Via

Access Paper or Ask Questions

Parkinson's Disease Detection from Resting State EEG using Multi-Head Graph Structure Learning with Gradient Weighted Graph Attention Explanations

Aug 01, 2024

Christopher Neves, Yong Zeng, Yiming Xiao

Figure 1 for Parkinson's Disease Detection from Resting State EEG using Multi-Head Graph Structure Learning with Gradient Weighted Graph Attention Explanations

Figure 2 for Parkinson's Disease Detection from Resting State EEG using Multi-Head Graph Structure Learning with Gradient Weighted Graph Attention Explanations

Figure 3 for Parkinson's Disease Detection from Resting State EEG using Multi-Head Graph Structure Learning with Gradient Weighted Graph Attention Explanations

Abstract:Parkinson's disease (PD) is a debilitating neurodegenerative disease that has severe impacts on an individual's quality of life. Compared with structural and functional MRI-based biomarkers for the disease, electroencephalography (EEG) can provide more accessible alternatives for clinical insights. While deep learning (DL) techniques have provided excellent outcomes, many techniques fail to model spatial information and dynamic brain connectivity, and face challenges in robust feature learning, limited data sizes, and poor explainability. To address these issues, we proposed a novel graph neural network (GNN) technique for explainable PD detection using resting state EEG. Specifically, we employ structured global convolutions with contrastive learning to better model complex features with limited data, a novel multi-head graph structure learner to capture the non-Euclidean structure of EEG data, and a head-wise gradient-weighted graph attention explainer to offer neural connectivity insights. We developed and evaluated our method using the UC San Diego Parkinson's disease EEG dataset, and achieved 69.40% detection accuracy in subject-wise leave-one-out cross-validation while generating intuitive explanations for the learnt graph topology.

* Accepted at MLCN 2024

Via

Access Paper or Ask Questions

CT-based brain ventricle segmentation via diffusion Schrödinger Bridge without target domain ground truths

May 28, 2024

Reihaneh Teimouri, Marta Kersten-Oertel, Yiming Xiao

Figure 1 for CT-based brain ventricle segmentation via diffusion Schrödinger Bridge without target domain ground truths

Figure 2 for CT-based brain ventricle segmentation via diffusion Schrödinger Bridge without target domain ground truths

Figure 3 for CT-based brain ventricle segmentation via diffusion Schrödinger Bridge without target domain ground truths

Figure 4 for CT-based brain ventricle segmentation via diffusion Schrödinger Bridge without target domain ground truths

Abstract:Efficient and accurate brain ventricle segmentation from clinical CT scans is critical for emergency surgeries like ventriculostomy. With the challenges in poor soft tissue contrast and a scarcity of well-annotated databases for clinical brain CTs, we introduce a novel uncertainty-aware ventricle segmentation technique without the need of CT segmentation ground truths by leveraging diffusion-model-based domain adaptation. Specifically, our method employs the diffusion Schr\"odinger Bridge and an attention recurrent residual U-Net to capitalize on unpaired CT and MRI scans to derive automatic CT segmentation from those of the MRIs, which are more accessible. Importantly, we propose an end-to-end, joint training framework of image translation and segmentation tasks, and demonstrate its benefit over training individual tasks separately. By comparing the proposed method against similar setups using two different GAN models for domain adaptation (CycleGAN and CUT), we also reveal the advantage of diffusion models towards improved segmentation and image translation quality. With a Dice score of 0.78$\pm$0.27, our proposed method outperformed the compared methods, including SynSeg-Net, while providing intuitive uncertainty measures to further facilitate quality control of the automatic segmentation outcomes.

* Early acceptance at MICCAI2024

Via

Access Paper or Ask Questions

MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation

Mar 29, 2024

Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao

Abstract:Medical image segmentation of anatomical structures and pathology is crucial in modern clinical diagnosis, disease study, and treatment planning. To date, great progress has been made in deep learning-based segmentation techniques, but most methods still lack data efficiency, generalizability, and interactability. Consequently, the development of new, precise segmentation methods that demand fewer labeled datasets is of utmost importance in medical image analysis. Recently, the emergence of foundation models, such as CLIP and Segment-Anything-Model (SAM), with comprehensive cross-domain representation opened the door for interactive and universal image segmentation. However, exploration of these models for data-efficient medical image segmentation is still limited, but is highly necessary. In this paper, we propose a novel framework, called MedCLIP-SAM that combines CLIP and SAM models to generate segmentation of clinical scans using text prompts in both zero-shot and weakly supervised settings. To achieve this, we employed a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss to fine-tune the BiomedCLIP model and the recent gScoreCAM to generate prompts to obtain segmentation masks from SAM in a zero-shot setting. Additionally, we explored the use of zero-shot segmentation labels in a weakly supervised paradigm to improve the segmentation quality further. By extensively testing three diverse segmentation tasks and medical image modalities (breast tumor ultrasound, brain tumor MRI, and lung X-ray), our proposed framework has demonstrated excellent accuracy.

* 10 pages, 2 figures

Via

Access Paper or Ask Questions

Joint chest X-ray diagnosis and clinical visual attention prediction with multi-stage cooperative learning: enhancing interpretability

Mar 29, 2024

Zirui Qiu, Hassan Rivaz, Yiming Xiao

Abstract:As deep learning has become the state-of-the-art for computer-assisted diagnosis, interpretability of the automatic decisions is crucial for clinical deployment. While various methods were proposed in this domain, visual attention maps of clinicians during radiological screening offer a unique asset to provide important insights and can potentially enhance the quality of computer-assisted diagnosis. With this paper, we introduce a novel deep-learning framework for joint disease diagnosis and prediction of corresponding visual saliency maps for chest X-ray scans. Specifically, we designed a novel dual-encoder multi-task UNet, which leverages both a DenseNet201 backbone and a Residual and Squeeze-and-Excitation block-based encoder to extract diverse features for saliency map prediction, and a multi-scale feature-fusion classifier to perform disease classification. To tackle the issue of asynchronous training schedules of individual tasks in multi-task learning, we proposed a multi-stage cooperative learning strategy, with contrastive learning for feature encoder pretraining to boost performance. Experiments show that our proposed method outperformed existing techniques for chest X-ray diagnosis and the quality of visual saliency map prediction.

Via

Access Paper or Ask Questions