Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Farhana Zulkernine

Interpreting Biomedical VLMs on High-Imbalance Out-of-Distributions: An Insight into BiomedCLIP on Radiology

Jun 17, 2025

Nafiz Sadman, Farhana Zulkernine, Benjamin Kwan

Abstract:In this paper, we construct two research objectives: i) explore the learned embedding space of BiomedCLIP, an open-source large vision language model, to analyse meaningful class separations, and ii) quantify the limitations of BiomedCLIP when applied to a highly imbalanced, out-of-distribution multi-label medical dataset. We experiment on IU-xray dataset, which exhibits the aforementioned criteria, and evaluate BiomedCLIP in classifying images (radiographs) in three contexts: zero-shot inference, full finetuning, and linear probing. The results show that the model under zero-shot settings over-predicts all labels, leading to poor precision and inter-class separability. Full fine-tuning improves classification of distinct diseases, while linear probing detects overlapping features. We demonstrate visual understanding of the model using Grad-CAM heatmaps and compare with 15 annotations by a radiologist. We highlight the need for careful adaptations of the models to foster reliability and applicability in a real-world setting. The code for the experiments in this work is available and maintained on GitHub.

* GitHub: https://github.com/Nafiz95/BioVLM_Eval_CXR

Via

Access Paper or Ask Questions

SDA-GRIN for Adaptive Spatial-Temporal Multivariate Time Series Imputation

Oct 04, 2024

Amir Eskandari, Aman Anand, Drishti Sharma, Farhana Zulkernine

Figure 1 for SDA-GRIN for Adaptive Spatial-Temporal Multivariate Time Series Imputation

Figure 2 for SDA-GRIN for Adaptive Spatial-Temporal Multivariate Time Series Imputation

Figure 3 for SDA-GRIN for Adaptive Spatial-Temporal Multivariate Time Series Imputation

Figure 4 for SDA-GRIN for Adaptive Spatial-Temporal Multivariate Time Series Imputation

Abstract:In various applications, the multivariate time series often suffers from missing data. This issue can significantly disrupt systems that rely on the data. Spatial and temporal dependencies can be leveraged to impute the missing samples. Existing imputation methods often ignore dynamic changes in spatial dependencies. We propose a Spatial Dynamic Aware Graph Recurrent Imputation Network (SDA-GRIN) which is capable of capturing dynamic changes in spatial dependencies.SDA-GRIN leverages a multi-head attention mechanism to adapt graph structures with time. SDA-GRIN models multivariate time series as a sequence of temporal graphs and uses a recurrent message-passing architecture for imputation. We evaluate SDA-GRIN on four real-world datasets: SDA-GRIN improves MSE by 9.51% for the AQI and 9.40% for AQI-36. On the PEMS-BAY dataset, it achieves a 1.94% improvement in MSE. Detailed ablation study demonstrates the effect of window sizes and missing data on the performance of the method. Project page:https://ameskandari.github.io/sda-grin/

Via

Access Paper or Ask Questions

Survey: Transformer-based Models in Data Modality Conversion

Aug 08, 2024

Elyas Rashno, Amir Eskandari, Aman Anand, Farhana Zulkernine

Abstract:Transformers have made significant strides across various artificial intelligence domains, including natural language processing, computer vision, and audio processing. This success has naturally garnered considerable interest from both academic and industry researchers. Consequently, numerous Transformer variants (often referred to as X-formers) have been developed for these fields. However, a thorough and systematic review of these modality-specific conversions remains lacking. Modality Conversion involves the transformation of data from one form of representation to another, mimicking the way humans integrate and interpret sensory information. This paper provides a comprehensive review of transformer-based models applied to the primary modalities of text, vision, and speech, discussing their architectures, conversion methodologies, and applications. By synthesizing the literature on modality conversion, this survey aims to underline the versatility and scalability of transformers in advancing AI-driven content generation and understanding.

* Submitted to ACM Computing Surveys (CSUR)

Via

Access Paper or Ask Questions

Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health

Jun 19, 2024

Bo Wen, Raquel Norel, Julia Liu, Thaddeus Stappenbeck, Farhana Zulkernine, Huamin Chen

Figure 1 for Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health

Figure 2 for Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health

Figure 3 for Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health

Figure 4 for Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health

Abstract:The rapid advancements in large language models (LLMs) have opened up new opportunities for transforming patient engagement in healthcare through conversational AI. This paper presents an overview of the current landscape of LLMs in healthcare, specifically focusing on their applications in analyzing and generating conversations for improved patient engagement. We showcase the power of LLMs in handling unstructured conversational data through four case studies: (1) analyzing mental health discussions on Reddit, (2) developing a personalized chatbot for cognitive engagement in seniors, (3) summarizing medical conversation datasets, and (4) designing an AI-powered patient engagement system. These case studies demonstrate how LLMs can effectively extract insights and summarizations from unstructured dialogues and engage patients in guided, goal-oriented conversations. Leveraging LLMs for conversational analysis and generation opens new doors for many patient-centered outcomes research opportunities. However, integrating LLMs into healthcare raises important ethical considerations regarding data privacy, bias, transparency, and regulatory compliance. We discuss best practices and guidelines for the responsible development and deployment of LLMs in healthcare settings. Realizing the full potential of LLMs in digital health will require close collaboration between the AI and healthcare professionals communities to address technical challenges and ensure these powerful tools' safety, efficacy, and equity.

* 10 pages, 6 figures, ICDH 2024 invited paper

Via

Access Paper or Ask Questions

Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data

May 29, 2024

Yuhao Chen, Zhimu Wang, Bo Wen, Farhana Zulkernine

Figure 1 for Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data

Abstract:Unstructured text in medical notes and dialogues contains rich information. Recent advancements in Large Language Models (LLMs) have demonstrated superior performance in question answering and summarization tasks on unstructured text data, outperforming traditional text analysis approaches. However, there is a lack of scientific studies in the literature that methodically evaluate and report on the performance of different LLMs, specifically for domain-specific data such as medical chart notes. We propose an evaluation approach to analyze the performance of open-source LLMs such as Llama2 and Mistral for medical summarization tasks, using GPT-4 as an assessor. Our innovative approach to quantitative evaluation of LLMs can enable quality control, support the selection of effective LLMs for specific tasks, and advance knowledge discovery in digital health.

Via

Access Paper or Ask Questions

CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View Fusion

Oct 09, 2023

Donghao Qiao, Farhana Zulkernine

Abstract:Autonomous Vehicles (AVs) use multiple sensors to gather information about their surroundings. By sharing sensor data between Connected Autonomous Vehicles (CAVs), the safety and reliability of these vehicles can be improved through a concept known as cooperative perception. However, recent approaches in cooperative perception only share single sensor information such as cameras or LiDAR. In this research, we explore the fusion of multiple sensor data sources and present a framework, called CoBEVFusion, that fuses LiDAR and camera data to create a Bird's-Eye View (BEV) representation. The CAVs process the multi-modal data locally and utilize a Dual Window-based Cross-Attention (DWCA) module to fuse the LiDAR and camera features into a unified BEV representation. The fused BEV feature maps are shared among the CAVs, and a 3D Convolutional Neural Network is applied to aggregate the features from the CAVs. Our CoBEVFusion framework was evaluated on the cooperative perception dataset OPV2V for two perception tasks: BEV semantic segmentation and 3D object detection. The results show that our DWCA LiDAR-camera fusion model outperforms perception models with single-modal data and state-of-the-art BEV fusion models. Our overall cooperative perception architecture, CoBEVFusion, also achieves comparable performance with other cooperative perception models.

Via

Access Paper or Ask Questions

A Preliminary Study on Pattern Reconstruction for Optimal Storage of Wearable Sensor Data

Feb 25, 2023

Sazia Mahfuz, Farhana Zulkernine

Figure 1 for A Preliminary Study on Pattern Reconstruction for Optimal Storage of Wearable Sensor Data

Figure 2 for A Preliminary Study on Pattern Reconstruction for Optimal Storage of Wearable Sensor Data

Figure 3 for A Preliminary Study on Pattern Reconstruction for Optimal Storage of Wearable Sensor Data

Abstract:Efficient querying and retrieval of healthcare data is posing a critical challenge today with numerous connected devices continuously generating petabytes of images, text, and internet of things (IoT) sensor data. One approach to efficiently store the healthcare data is to extract the relevant and representative features and store only those features instead of the continuous streaming data. However, it raises a question as to the amount of information content we can retain from the data and if we can reconstruct the pseudo-original data when needed. By facilitating relevant and representative feature extraction, storage and reconstruction of near original pattern, we aim to address some of the challenges faced by the explosion of the streaming data. We present a preliminary study, where we explored multiple autoencoders for concise feature extraction and reconstruction for human activity recognition (HAR) sensor data. Our Multi-Layer Perceptron (MLP) deep autoencoder achieved a storage reduction of 90.18% compared to the three other implemented autoencoders namely convolutional autoencoder, Long-Short Term Memory (LSTM) autoencoder, and convolutional LSTM autoencoder which achieved storage reductions of 11.18%, 49.99%, and 72.35% respectively. Encoded features from the autoencoders have smaller size and dimensions which help to reduce the storage space. For higher dimensions of the representation, storage reduction was low. But retention of relevant information was high, which was validated by classification performed on the reconstructed data.

* 6 pages, Learning from Time Series for Health, Workshop at NeurIPS 2022

Via

Access Paper or Ask Questions

A Web Application for Experimenting and Validating Remote Measurement of Vital Signs

Aug 21, 2022

Amtul Haq Ayesha, Donghao Qiao, Farhana Zulkernine

Figure 1 for A Web Application for Experimenting and Validating Remote Measurement of Vital Signs

Figure 2 for A Web Application for Experimenting and Validating Remote Measurement of Vital Signs

Figure 3 for A Web Application for Experimenting and Validating Remote Measurement of Vital Signs

Figure 4 for A Web Application for Experimenting and Validating Remote Measurement of Vital Signs

Abstract:With a surge in online medical advising remote monitoring of patient vitals is required. This can be facilitated with the Remote Photoplethysmography (rPPG) techniques that compute vital signs from facial videos. It involves processing video frames to obtain skin pixels, extracting the cardiac data from it and applying signal processing filters to extract the Blood Volume Pulse (BVP) signal. Different algorithms are applied to the BVP signal to estimate the various vital signs. We implemented a web application framework to measure a person's Heart Rate (HR), Heart Rate Variability (HRV), Oxygen Saturation (SpO2), Respiration Rate (RR), Blood Pressure (BP), and stress from the face video. The rPPG technique is highly sensitive to illumination and motion variation. The web application guides the users to reduce the noise due to these variations and thereby yield a cleaner BVP signal. The accuracy and robustness of the framework was validated with the help of volunteers.

* 12 pages, 2 figures

Via

Access Paper or Ask Questions

ReViSe: Remote Vital Signs Measurement Using Smartphone Camera

Jun 13, 2022

Donghao Qiao, Amtul Haq Ayesha, Farhana Zulkernine, Raihan Masroor, Nauman Jaffar

Figure 1 for ReViSe: Remote Vital Signs Measurement Using Smartphone Camera

Figure 2 for ReViSe: Remote Vital Signs Measurement Using Smartphone Camera

Figure 3 for ReViSe: Remote Vital Signs Measurement Using Smartphone Camera

Figure 4 for ReViSe: Remote Vital Signs Measurement Using Smartphone Camera

Abstract:Remote Photoplethysmography (rPPG) is a fast, effective, inexpensive and convenient method for collecting biometric data as it enables vital signs estimation using face videos. Remote contactless medical service provisioning has proven to be a dire necessity during the COVID-19 pandemic. We propose an end-to-end framework to measure people's vital signs including Heart Rate (HR), Heart Rate Variability (HRV), Oxygen Saturation (SpO2) and Blood Pressure (BP) based on the rPPG methodology from the video of a user's face captured with a smartphone camera. We extract face landmarks with a deep learning-based neural network model in real-time. Multiple face patches also called Region-of-Interests (RoIs) are extracted by using the predicted face landmarks. Several filters are applied to reduce the noise from the RoIs in the extracted cardiac signals called Blood Volume Pulse (BVP) signal. We trained and validated machine learning models using two public rPPG datasets namely the TokyoTech rPPG and the Pulse Rate Detection (PURE) datasets, on which our models achieved the following Mean Absolute Errors (MAE): a) for HR, 1.73 and 3.95 Beats-Per-Minute (bpm) respectively, b) for HRV, 18.55 and 25.03 ms respectively, and c) for SpO2, a MAE of 1.64 on the PURE dataset. We validated our end-to-end rPPG framework, ReViSe, in real life environment, and thereby created the Video-HR dataset. Our HR estimation model achieved a MAE of 2.49 bpm on this dataset. Since no publicly available rPPG datasets existed for BP measurement with face videos, we used a dataset with signals from fingertip sensor to train our model and also created our own video dataset, Video-BP. On our Video-BP dataset, our BP estimation model achieved a MAE of 6.7 mmHg for Systolic Blood Pressure (SBP), and a MAE of 9.6 mmHg for Diastolic Blood Pressure (DBP).

Via

Access Paper or Ask Questions

Incremental Community Detection in Distributed Dynamic Graph

Oct 12, 2021

Tariq Abughofa, Ahmed A. Harby, Haruna Isah, Farhana Zulkernine

Figure 1 for Incremental Community Detection in Distributed Dynamic Graph

Figure 2 for Incremental Community Detection in Distributed Dynamic Graph

Figure 3 for Incremental Community Detection in Distributed Dynamic Graph

Figure 4 for Incremental Community Detection in Distributed Dynamic Graph

Abstract:Community detection is an important research topic in graph analytics that has a wide range of applications. A variety of static community detection algorithms and quality metrics were developed in the past few years. However, most real-world graphs are not static and often change over time. In the case of streaming data, communities in the associated graph need to be updated either continuously or whenever new data streams are added to the graph, which poses a much greater challenge in devising good community detection algorithms for maintaining dynamic graphs over streaming data. In this paper, we propose an incremental community detection algorithm for maintaining a dynamic graph over streaming data. The contributions of this study include (a) the implementation of a Distributed Weighted Community Clustering (DWCC) algorithm, (b) the design and implementation of a novel Incremental Distributed Weighted Community Clustering (IDWCC) algorithm, and (c) an experimental study to compare the performance of our IDWCC algorithm with the DWCC algorithm. We validate the functionality and efficiency of our framework in processing streaming data and performing large in-memory distributed dynamic graph analytics. The results demonstrate that our IDWCC algorithm performs up to three times faster than the DWCC algorithm for a similar accuracy.

* BigDataService 2021 best paper award

Via

Access Paper or Ask Questions