Abstract:Over the last decade, the use of machine learning (ML) approaches in medicinal applications has increased manifold. Most of these approaches are based on deep learning, which aims to learn representations from grid data (like medical images). However, reinforcement learning (RL) applications in medicine are relatively less explored. Medical applications often involve a sequence of subtasks that form a diagnostic pipeline, and RL is uniquely suited to optimize over such sequential decision-making tasks. Ultrasound (US) image analysis is a quintessential example of such a sequential decision-making task, where the raw signal captured by the US transducer undergoes a series of signal processing and image post-processing steps, generally leading to a diagnostic suggestion. The application of RL in US remains limited. Deep Reinforcement Learning (DRL), that combines deep learning and RL, holds great promise in optimizing these pipelines by enabling intelligent and sequential decision-making. This review paper surveys the applications of RL in US over the last decade. We provide a succinct overview of the theoretic framework of RL and its application in US image processing and review existing work in each aspect of the image analysis pipeline. A comprehensive search of Scopus filtered on relevance yielded 14 papers most relevant to this topic. These papers were further categorized based on their target applications image classification, image segmentation, image enhancement, video summarization, and auto navigation and path planning. We also examined the type of RL approach used in each publication. Finally, we discuss key areas in healthcare where DRL approaches in US could be used for sequential decision-making. We analyze the opportunities, challenges, and limitations, providing insights into the future potential of DRL in US image analysis.
Abstract:Foundation models like the segment anything model require high-quality manual prompts for medical image segmentation, which is time-consuming and requires expertise. SAM and its variants often fail to segment structures in ultrasound (US) images due to domain shift. We propose Sam2Rad, a prompt learning approach to adapt SAM and its variants for US bone segmentation without human prompts. It introduces a prompt predictor network (PPN) with a cross-attention module to predict prompt embeddings from image encoder features. PPN outputs bounding box and mask prompts, and 256-dimensional embeddings for regions of interest. The framework allows optional manual prompting and can be trained end-to-end using parameter-efficient fine-tuning (PEFT). Sam2Rad was tested on 3 musculoskeletal US datasets: wrist (3822 images), rotator cuff (1605 images), and hip (4849 images). It improved performance across all datasets without manual prompts, increasing Dice scores by 2-7% for hip/wrist and up to 33% for shoulder data. Sam2Rad can be trained with as few as 10 labeled images and is compatible with any SAM architecture for automatic segmentation.
Abstract:Conventional deep learning models deal with images one-by-one, requiring costly and time-consuming expert labeling in the field of medical imaging, and domain-specific restriction limits model generalizability. Visual in-context learning (ICL) is a new and exciting area of research in computer vision. Unlike conventional deep learning, ICL emphasizes the model's ability to adapt to new tasks based on given examples quickly. Inspired by MAE-VQGAN, we proposed a new simple visual ICL method called SimICL, combining visual ICL pairing images with masked image modeling (MIM) designed for self-supervised learning. We validated our method on bony structures segmentation in a wrist ultrasound (US) dataset with limited annotations, where the clinical objective was to segment bony structures to help with further fracture detection. We used a test set containing 3822 images from 18 patients for bony region segmentation. SimICL achieved an remarkably high Dice coeffient (DC) of 0.96 and Jaccard Index (IoU) of 0.92, surpassing state-of-the-art segmentation and visual ICL models (a maximum DC 0.86 and IoU 0.76), with SimICL DC and IoU increasing up to 0.10 and 0.16. This remarkably high agreement with limited manual annotations indicates SimICL could be used for training AI models even on small US datasets. This could dramatically decrease the human expert time required for image labeling compared to conventional approaches, and enhance the real-world use of AI assistance in US image analysis.
Abstract:Osteoarthritis (OA) poses a global health challenge, demanding precise diagnostic methods. Current radiographic assessments are time consuming and prone to variability, prompting the need for automated solutions. The existing deep learning models for OA assessment are unimodal single task systems and they don't incorporate relevant text information such as patient demographics, disease history, or physician reports. This study investigates employing Vision Language Processing (VLP) models to predict OA severity using Xray images and corresponding reports. Our method leverages Xray images of the knee and diverse report templates generated from tabular OA scoring values to train a CLIP (Contrastive Language Image PreTraining) style VLP model. Furthermore, we incorporate additional contrasting captions to enforce the model to discriminate between positive and negative reports. Results demonstrate the efficacy of these models in learning text image representations and their contextual relationships, showcase potential advancement in OA assessment, and establish a foundation for specialized vision language models in medical contexts.