Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Agnik Saha

Large Language Models for Cancer Communication: Evaluating Linguistic Quality, Safety, and Accessibility in Generative AI

May 15, 2025

Agnik Saha, Victoria Churchill, Anny D. Rodriguez, Ugur Kursuncu, Muhammed Y. Idris

Abstract:Effective communication about breast and cervical cancers remains a persistent health challenge, with significant gaps in public understanding of cancer prevention, screening, and treatment, potentially leading to delayed diagnoses and inadequate treatments. This study evaluates the capabilities and limitations of Large Language Models (LLMs) in generating accurate, safe, and accessible cancer-related information to support patient understanding. We evaluated five general-purpose and three medical LLMs using a mixed-methods evaluation framework across linguistic quality, safety and trustworthiness, and communication accessibility and affectiveness. Our approach utilized quantitative metrics, qualitative expert ratings, and statistical analysis using Welch's ANOVA, Games-Howell, and Hedges' g. Our results show that general-purpose LLMs produced outputs of higher linguistic quality and affectiveness, while medical LLMs demonstrate greater communication accessibility. However, medical LLMs tend to exhibit higher levels of potential harm, toxicity, and bias, reducing their performance in safety and trustworthiness. Our findings indicate a duality between domain-specific knowledge and safety in health communications. The results highlight the need for intentional model design with targeted improvements, particularly in mitigating harm and bias, and improving safety and affectiveness. This study provides a comprehensive evaluation of LLMs for cancer communication, offering critical insights for improving AI-generated health content and informing future development of accurate, safe, and accessible digital health tools.

Via

Access Paper or Ask Questions

Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models

Jan 14, 2025

Abdulkadir Erol, Trilok Padhi, Agnik Saha, Ugur Kursuncu, Mehmet Emin Aktas

Figure 1 for Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models

Figure 2 for Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models

Figure 3 for Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models

Figure 4 for Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models

Abstract:The rapid advancement of Large Vision-Language Models (LVLMs) has enhanced capabilities offering potential applications from content creation to productivity enhancement. Despite their innovative potential, LVLMs exhibit vulnerabilities, especially in generating potentially toxic or unsafe responses. Malicious actors can exploit these vulnerabilities to propagate toxic content in an automated (or semi-) manner, leveraging the susceptibility of LVLMs to deception via strategically crafted prompts without fine-tuning or compute-intensive procedures. Despite the red-teaming efforts and inherent potential risks associated with the LVLMs, exploring vulnerabilities of LVLMs remains nascent and yet to be fully addressed in a systematic manner. This study systematically examines the vulnerabilities of open-source LVLMs, including LLaVA, InstructBLIP, Fuyu, and Qwen, using adversarial prompt strategies that simulate real-world social manipulation tactics informed by social theories. Our findings show that (i) toxicity and insulting are the most prevalent behaviors, with the mean rates of 16.13% and 9.75%, respectively; (ii) Qwen-VL-Chat, LLaVA-v1.6-Vicuna-7b, and InstructBLIP-Vicuna-7b are the most vulnerable models, exhibiting toxic response rates of 21.50%, 18.30% and 17.90%, and insulting responses of 13.40%, 11.70% and 10.10%, respectively; (iii) prompting strategies incorporating dark humor and multimodal toxic prompt completion significantly elevated these vulnerabilities. Despite being fine-tuned for safety, these models still generate content with varying degrees of toxicity when prompted with adversarial inputs, highlighting the urgent need for enhanced safety mechanisms and robust guardrails in LVLM development.

Via

Access Paper or Ask Questions

Evaluating the Ebb and Flow: An In-depth Analysis of Question-Answering Trends across Diverse Platforms

Sep 12, 2023

Rima Hazra, Agnik Saha, Somnath Banerjee, Animesh Mukherjee

Figure 1 for Evaluating the Ebb and Flow: An In-depth Analysis of Question-Answering Trends across Diverse Platforms

Figure 2 for Evaluating the Ebb and Flow: An In-depth Analysis of Question-Answering Trends across Diverse Platforms

Figure 3 for Evaluating the Ebb and Flow: An In-depth Analysis of Question-Answering Trends across Diverse Platforms

Figure 4 for Evaluating the Ebb and Flow: An In-depth Analysis of Question-Answering Trends across Diverse Platforms

Abstract:Community Question Answering (CQA) platforms steadily gain popularity as they provide users with fast responses to their queries. The swiftness of these responses is contingent on a mixture of query-specific and user-related elements. This paper scrutinizes these contributing factors within the context of six highly popular CQA platforms, identified through their standout answering speed. Our investigation reveals a correlation between the time taken to yield the first response to a question and several variables: the metadata, the formulation of the questions, and the level of interaction among users. Additionally, by employing conventional machine learning models to analyze these metadata and patterns of user interaction, we endeavor to predict which queries will receive their initial responses promptly.

* Under review

Via

Access Paper or Ask Questions