Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Prithul Sarker

Can Score-Based Generative Modeling Effectively Handle Medical Image Classification?

Feb 24, 2025

Sushmita Sarker, Prithul Sarker, George Bebis, Alireza Tavakkoli

Abstract:The remarkable success of deep learning in recent years has prompted applications in medical image classification and diagnosis tasks. While classification models have demonstrated robustness in classifying simpler datasets like MNIST or natural images such as ImageNet, this resilience is not consistently observed in complex medical image datasets where data is more scarce and lacks diversity. Moreover, previous findings on natural image datasets have indicated a potential trade-off between data likelihood and classification accuracy. In this study, we explore the use of score-based generative models as classifiers for medical images, specifically mammographic images. Our findings suggest that our proposed generative classifier model not only achieves superior classification results on CBIS-DDSM, INbreast and Vin-Dr Mammo datasets, but also introduces a novel approach to image classification in a broader context. Our code is publicly available at https://github.com/sushmitasarker/sgc_for_medical_image_classification

* Accepted at the International Symposium on Biomedical Imaging (ISBI) 2025

Via

Access Paper or Ask Questions

General Geospatial Inference with a Population Dynamics Foundation Model

Nov 13, 2024

Mohit Agarwal, Mimi Sun, Chaitanya Kamath, Arbaaz Muslim, Prithul Sarker, Joydeep Paul, Hector Yee, Marcin Sieniek, Kim Jablonski, Yael Mayer(+24 more)

Figure 1 for General Geospatial Inference with a Population Dynamics Foundation Model

Figure 2 for General Geospatial Inference with a Population Dynamics Foundation Model

Figure 3 for General Geospatial Inference with a Population Dynamics Foundation Model

Figure 4 for General Geospatial Inference with a Population Dynamics Foundation Model

Abstract:Supporting the health and well-being of dynamic populations around the world requires governmental agencies, organizations and researchers to understand and reason over complex relationships between human behavior and local contexts in order to identify high-risk groups and strategically allocate limited resources. Traditional approaches to these classes of problems often entail developing manually curated, task-specific features and models to represent human behavior and the natural and built environment, which can be challenging to adapt to new, or even, related tasks. To address this, we introduce a Population Dynamics Foundation Model (PDFM) that aims to capture the relationships between diverse data modalities and is applicable to a broad range of geospatial tasks. We first construct a geo-indexed dataset for postal codes and counties across the United States, capturing rich aggregated information on human behavior from maps, busyness, and aggregated search trends, and environmental factors such as weather and air quality. We then model this data and the complex relationships between locations using a graph neural network, producing embeddings that can be adapted to a wide range of downstream tasks using relatively simple models. We evaluate the effectiveness of our approach by benchmarking it on 27 downstream tasks spanning three distinct domains: health indicators, socioeconomic factors, and environmental measurements. The approach achieves state-of-the-art performance on all 27 geospatial interpolation tasks, and on 25 out of the 27 extrapolation and super-resolution tasks. We combined the PDFM with a state-of-the-art forecasting foundation model, TimesFM, to predict unemployment and poverty, achieving performance that surpasses fully supervised forecasting. The full set of embeddings and sample code are publicly available for researchers.

* 28 pages, 16 figures, preprint; v2: updated github url

Via

Access Paper or Ask Questions

A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation

May 20, 2024

Sushmita Sarker, Prithul Sarker, Gunner Stone, Ryan Gorman, Alireza Tavakkoli, George Bebis, Javad Sattarvand

Abstract:Point cloud analysis has a wide range of applications in many areas such as computer vision, robotic manipulation, and autonomous driving. While deep learning has achieved remarkable success on image-based tasks, there are many unique challenges faced by deep neural networks in processing massive, unordered, irregular and noisy 3D points. To stimulate future research, this paper analyzes recent progress in deep learning methods employed for point cloud processing and presents challenges and potential directions to advance this field. It serves as a comprehensive review on two major tasks in 3D point cloud processing-- namely, 3D shape classification and semantic segmentation.

* Machine Vision and Applications 35, 67 (2024)
* Published in Springer Nature (Machine Vision and Applications)

Via

Access Paper or Ask Questions

MV-Swin-T: Mammogram Classification with Multi-view Swin Transformer

Feb 26, 2024

Sushmita Sarker, Prithul Sarker, George Bebis, Alireza Tavakkoli

Abstract:Traditional deep learning approaches for breast cancer classification has predominantly concentrated on single-view analysis. In clinical practice, however, radiologists concurrently examine all views within a mammography exam, leveraging the inherent correlations in these views to effectively detect tumors. Acknowledging the significance of multi-view analysis, some studies have introduced methods that independently process mammogram views, either through distinct convolutional branches or simple fusion strategies, inadvertently leading to a loss of crucial inter-view correlations. In this paper, we propose an innovative multi-view network exclusively based on transformers to address challenges in mammographic image classification. Our approach introduces a novel shifted window-based dynamic attention block, facilitating the effective integration of multi-view information and promoting the coherent transfer of this information between views at the spatial feature map level. Furthermore, we conduct a comprehensive comparative analysis of the performance and effectiveness of transformer-based models under diverse settings, employing the CBIS-DDSM and Vin-Dr Mammo datasets. Our code is publicly available at https://github.com/prithuls/MV-Swin-T

* 4 pages, 2 figures

Via

Access Paper or Ask Questions

ConnectedUNets++: Mass Segmentation from Whole Mammographic Images

Nov 04, 2022

Prithul Sarker, Sushmita Sarker, George Bebis, Alireza Tavakkoli

Figure 1 for ConnectedUNets++: Mass Segmentation from Whole Mammographic Images

Figure 2 for ConnectedUNets++: Mass Segmentation from Whole Mammographic Images

Figure 3 for ConnectedUNets++: Mass Segmentation from Whole Mammographic Images

Figure 4 for ConnectedUNets++: Mass Segmentation from Whole Mammographic Images

Abstract:Deep learning has made a breakthrough in medical image segmentation in recent years due to its ability to extract high-level features without the need for prior knowledge. In this context, U-Net is one of the most advanced medical image segmentation models, with promising results in mammography. Despite its excellent overall performance in segmenting multimodal medical images, the traditional U-Net structure appears to be inadequate in various ways. There are certain U-Net design modifications, such as MultiResUNet, Connected-UNets, and AU-Net, that have improved overall performance in areas where the conventional U-Net architecture appears to be deficient. Following the success of UNet and its variants, we have presented two enhanced versions of the Connected-UNets architecture: ConnectedUNets+ and ConnectedUNets++. In ConnectedUNets+, we have replaced the simple skip connections of Connected-UNets architecture with residual skip connections, while in ConnectedUNets++, we have modified the encoder-decoder structure along with employing residual skip connections. We have evaluated our proposed architectures on two publicly available datasets, the Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) and INbreast.

* Results are to be updated

Via

Access Paper or Ask Questions

Virtual-Reality based Vestibular Ocular Motor Screening for Concussion Detection using Machine-Learning

Oct 13, 2022

Khondker Fariha Hossain, Sharif Amit Kamran, Prithul Sarker, Philip Pavilionis, Isayas Adhanom, Nicholas Murray, Alireza Tavakkoli

Figure 1 for Virtual-Reality based Vestibular Ocular Motor Screening for Concussion Detection using Machine-Learning

Figure 2 for Virtual-Reality based Vestibular Ocular Motor Screening for Concussion Detection using Machine-Learning

Figure 3 for Virtual-Reality based Vestibular Ocular Motor Screening for Concussion Detection using Machine-Learning

Figure 4 for Virtual-Reality based Vestibular Ocular Motor Screening for Concussion Detection using Machine-Learning

Abstract:Sport-related concussion (SRC) depends on sensory information from visual, vestibular, and somatosensory systems. At the same time, the current clinical administration of Vestibular/Ocular Motor Screening (VOMS) is subjective and deviates among administrators. Therefore, for the assessment and management of concussion detection, standardization is required to lower the risk of injury and increase the validation among clinicians. With the advancement of technology, virtual reality (VR) can be utilized to advance the standardization of the VOMS, increasing the accuracy of testing administration and decreasing overall false positive rates. In this paper, we experimented with multiple machine learning methods to detect SRC on VR-generated data using VOMS. In our observation, the data generated from VR for smooth pursuit (SP) and the Visual Motion Sensitivity (VMS) tests are highly reliable for concussion detection. Furthermore, we train and evaluate these models, both qualitatively and quantitatively. Our findings show these models can reach high true-positive-rates of around 99.9 percent of symptom provocation on the VR stimuli-based VOMS vs. current clinical manual VOMS.

* Accepted in 17th International Symposium on Visual Computing,2022

Via

Access Paper or Ask Questions

Analysis of Smooth Pursuit Assessment in Virtual Reality and Concussion Detection using BiLSTM

Oct 12, 2022

Prithul Sarker, Khondker Fariha Hossain, Isayas Berhe Adhanom, Philip K Pavilionis, Nicholas G. Murray, Alireza Tavakkoli

Figure 1 for Analysis of Smooth Pursuit Assessment in Virtual Reality and Concussion Detection using BiLSTM

Figure 2 for Analysis of Smooth Pursuit Assessment in Virtual Reality and Concussion Detection using BiLSTM

Figure 3 for Analysis of Smooth Pursuit Assessment in Virtual Reality and Concussion Detection using BiLSTM

Figure 4 for Analysis of Smooth Pursuit Assessment in Virtual Reality and Concussion Detection using BiLSTM

Abstract:The sport-related concussion (SRC) battery relies heavily upon subjective symptom reporting in order to determine the diagnosis of a concussion. Unfortunately, athletes with SRC may return-to-play (RTP) too soon if they are untruthful of their symptoms. It is critical to provide accurate assessments that can overcome underreporting to prevent further injury. To lower the risk of injury, a more robust and precise method for detecting concussion is needed to produce reliable and objective results. In this paper, we propose a novel approach to detect SRC using long short-term memory (LSTM) recurrent neural network (RNN) architectures from oculomotor data. In particular, we propose a new error metric that incorporates mean squared error in different proportions. The experimental results on the smooth pursuit test of the VR-VOMS dataset suggest that the proposed approach can predict concussion symptoms with higher accuracy compared to symptom provocation on the vestibular ocular motor screening (VOMS).

* International Symposium on Visual Computing

Via

Access Paper or Ask Questions

VR-SFT: Reproducing Swinging Flashlight Test in Virtual Reality to Detect Relative Afferent Pupillary Defect

Oct 12, 2022

Prithul Sarker, Nasif Zaman, Alireza Tavakkoli

Figure 1 for VR-SFT: Reproducing Swinging Flashlight Test in Virtual Reality to Detect Relative Afferent Pupillary Defect

Figure 2 for VR-SFT: Reproducing Swinging Flashlight Test in Virtual Reality to Detect Relative Afferent Pupillary Defect

Figure 3 for VR-SFT: Reproducing Swinging Flashlight Test in Virtual Reality to Detect Relative Afferent Pupillary Defect

Figure 4 for VR-SFT: Reproducing Swinging Flashlight Test in Virtual Reality to Detect Relative Afferent Pupillary Defect

Abstract:The relative afferent asymmetry between two eyes can be diagnosed using swinging flashlight test, also known as the alternating light test. This remains one of the most used clinical tests to this day. Despite the swinging flashlight test's straightforward approach, a number of factors can add variability into the clinical methodology and reduce the measurement's validity and reliability. This includes small and poorly responsive pupils, dark iris, anisocoria, uneven illumination in both eyes. Due to these limitations, the true condition of relative afferent asymmetry may create confusion and various observers may quantify the relative afferent pupillary defect differently. Consequently, the results of the swinging flashlight test are subjective and ambiguous. In order to eliminate the limitations of traditional swinging flashlight test and introduce objectivity, we propose a novel approach to the swinging flashlight exam, VR-SFT, by making use of virtual reality (VR). We suggest that the clinical records of the subjects and the results of VR-SFT are comparable. In this paper, we describe how we exploit the features of immersive VR experience to create a reliable and objective swinging flashlight test.

* International Symposium on Visual Computing

Via

Access Paper or Ask Questions