Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Akshay Paruchuri

"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets

Jun 26, 2025

Akshay Paruchuri, Maryam Aziz, Rohit Vartak, Ayman Ali, Best Uchehara, Xin Liu, Ishan Chatterjee, Monica Agrawal

Abstract:People are increasingly seeking healthcare information from large language models (LLMs) via interactive chatbots, yet the nature and inherent risks of these conversations remain largely unexplored. In this paper, we filter large-scale conversational AI datasets to achieve HealthChat-11K, a curated dataset of 11K real-world conversations composed of 25K user messages. We use HealthChat-11K and a clinician-driven taxonomy for how users interact with LLMs when seeking healthcare information in order to systematically study user interactions across 21 distinct health specialties. Our analysis reveals insights into the nature of how and why users seek health information, such as common interactions, instances of incomplete context, affective behaviors, and interactions (e.g., leading questions) that can induce sycophancy, underscoring the need for improvements in the healthcare support capabilities of LLMs deployed as conversational AI. Code and artifacts to retrieve our analyses and combine them into a curated dataset can be found here: https://github.com/yahskapar/HealthChat

* 25 pages, 6 figures, 4 tables, corresponds to initial HealthChat-11K dataset release

Via

Access Paper or Ask Questions

RADAR: Benchmarking Language Models on Imperfect Tabular Data

Jun 09, 2025

Ken Gu, Zhihan Zhang, Kate Lin, Yuwei Zhang, Akshay Paruchuri, Hong Yu, Mehran Kazemi, Kumar Ayush, A. Ali Heydari, Maxwell A. Xu(+11 more)

Figure 1 for RADAR: Benchmarking Language Models on Imperfect Tabular Data

Figure 2 for RADAR: Benchmarking Language Models on Imperfect Tabular Data

Figure 3 for RADAR: Benchmarking Language Models on Imperfect Tabular Data

Figure 4 for RADAR: Benchmarking Language Models on Imperfect Tabular Data

Abstract:Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compromise the validity of analytical conclusions. To address this gap, we present RADAR, a benchmark for systematically evaluating data-aware reasoning on tabular data. We develop a framework to simulate data artifacts via programmatic perturbations to enable targeted evaluation of model behavior. RADAR comprises 2980 table query pairs, grounded in real-world data spanning 9 domains and 5 data artifact types. In addition to evaluating artifact handling, RADAR systematically varies table size to study how reasoning performance holds when increasing table size. Our evaluation reveals that, despite decent performance on tables without data artifacts, frontier models degrade significantly when data artifacts are introduced, exposing critical gaps in their capacity for robust, data-aware analysis. Designed to be flexible and extensible, RADAR supports diverse perturbation types and controllable table sizes, offering a valuable resource for advancing tabular reasoning.

Via

Access Paper or Ask Questions

Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video

Aug 19, 2024

Shuxian Wang, Akshay Paruchuri, Zhaoxi Zhang, Sarah McGill, Roni Sengupta

Figure 1 for Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video

Figure 2 for Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video

Figure 3 for Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video

Figure 4 for Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video

Abstract:Monocular depth estimation in colonoscopy video aims to overcome the unusual lighting properties of the colonoscopic environment. One of the major challenges in this area is the domain gap between annotated but unrealistic synthetic data and unannotated but realistic clinical data. Previous attempts to bridge this domain gap directly target the depth estimation task itself. We propose a general pipeline of structure-preserving synthetic-to-real (sim2real) image translation (producing a modified version of the input image) to retain depth geometry through the translation process. This allows us to generate large quantities of realistic-looking synthetic images for supervised depth estimation with improved generalization to the clinical domain. We also propose a dataset of hand-picked sequences from clinical colonoscopies to improve the image translation process. We demonstrate the simultaneous realism of the translated images and preservation of depth maps via the performance of downstream depth estimation on various datasets.

* 12 pages, 7 figures, accepted at MICCAI 2024

Via

Access Paper or Ask Questions

What Are the Odds? Language Models Are Capable of Probabilistic Reasoning

Jun 18, 2024

Akshay Paruchuri, Jake Garrison, Shun Liao, John Hernandez, Jacob Sunshine, Tim Althoff, Xin Liu, Daniel McDuff

Figure 1 for What Are the Odds? Language Models Are Capable of Probabilistic Reasoning

Figure 2 for What Are the Odds? Language Models Are Capable of Probabilistic Reasoning

Figure 3 for What Are the Odds? Language Models Are Capable of Probabilistic Reasoning

Figure 4 for What Are the Odds? Language Models Are Capable of Probabilistic Reasoning

Abstract:Language models (LM) are capable of remarkably complex linguistic tasks; however, numerical reasoning is an area in which they frequently struggle. An important but rarely evaluated form of reasoning is understanding probability distributions. In this paper, we focus on evaluating the probabilistic reasoning capabilities of LMs using idealized and real-world statistical distributions. We perform a systematic evaluation of state-of-the-art LMs on three tasks: estimating percentiles, drawing samples, and calculating probabilities. We evaluate three ways to provide context to LMs 1) anchoring examples from within a distribution or family of distributions, 2) real-world context, 3) summary statistics on which to base a Normal approximation. Models can make inferences about distributions, and can be further aided by the incorporation of real-world context, example shots and simplified assumptions, even if these assumptions are incorrect or misspecified. To conduct this work, we developed a comprehensive benchmark distribution dataset with associated question-answer pairs that we will release publicly.

* 21 pages, 9 figures, 2 tables

Via

Access Paper or Ask Questions

Transforming Wearable Data into Health Insights using Large Language Model Agents

Jun 11, 2024

Mike A. Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor(+10 more)

Figure 1 for Transforming Wearable Data into Health Insights using Large Language Model Agents

Figure 2 for Transforming Wearable Data into Health Insights using Large Language Model Agents

Figure 3 for Transforming Wearable Data into Health Insights using Large Language Model Agents

Figure 4 for Transforming Wearable Data into Health Insights using Large Language Model Agents

Abstract:Despite the proliferation of wearable health trackers and the importance of sleep and exercise to health, deriving actionable personalized insights from wearable data remains a challenge because doing so requires non-trivial open-ended analysis of these data. The recent rise of large language model (LLM) agents, which can use tools to reason about and interact with the world, presents a promising opportunity to enable such personalized analysis at scale. Yet, the application of LLM agents in analyzing personal health is still largely untapped. In this paper, we introduce the Personal Health Insights Agent (PHIA), an agent system that leverages state-of-the-art code generation and information retrieval tools to analyze and interpret behavioral health data from wearables. We curate two benchmark question-answering datasets of over 4000 health insights questions. Based on 650 hours of human and expert evaluation we find that PHIA can accurately address over 84% of factual numerical questions and more than 83% of crowd-sourced open-ended questions. This work has implications for advancing behavioral health across the population, potentially enabling individuals to interpret their own wearable data, and paving the way for a new era of accessible, personalized wellness regimens that are informed by data-driven insights.

* 38 pages

Via

Access Paper or Ask Questions

Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

Mar 26, 2024

Akshay Paruchuri, Samuel Ehrenstein, Shuxian Wang, Inbar Fried, Stephen M. Pizer, Marc Niethammer, Roni Sengupta

Figure 1 for Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

Figure 2 for Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

Figure 3 for Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

Figure 4 for Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

Abstract:Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues. Despite promising progress on mainstream, natural image depth estimation, techniques perform poorly on endoscopy images due to a lack of strong geometric features and challenging illumination effects. In this paper, we utilize the photometric cues, i.e., the light emitted from an endoscope and reflected by the surface, to improve monocular depth estimation. We first create two novel loss functions with supervised and self-supervised variants that utilize a per-pixel shading representation. We then propose a novel depth refinement network (PPSNet) that leverages the same per-pixel shading representation. Finally, we introduce teacher-student transfer learning to produce better depth maps from both synthetic data with supervision and clinical data with self-supervision. We achieve state-of-the-art results on the C3VD dataset while estimating high-quality depth maps from clinical data. Our code, pre-trained models, and supplementary materials can be found on our project page: https://ppsnet.github.io/

* 26 pages, 7 tables, 7 figures

Via

Access Paper or Ask Questions

Motion Matters: Neural Motion Transfer for Better Camera Physiological Sensing

Apr 02, 2023

Akshay Paruchuri, Xin Liu, Yulu Pan, Shwetak Patel, Daniel McDuff, Soumyadip Sengupta

Figure 1 for Motion Matters: Neural Motion Transfer for Better Camera Physiological Sensing

Figure 2 for Motion Matters: Neural Motion Transfer for Better Camera Physiological Sensing

Figure 3 for Motion Matters: Neural Motion Transfer for Better Camera Physiological Sensing

Figure 4 for Motion Matters: Neural Motion Transfer for Better Camera Physiological Sensing

Abstract:Machine learning models for camera-based physiological measurement can have weak generalization due to a lack of representative training data. Body motion is one of the most significant sources of noise when attempting to recover the subtle cardiac pulse from a video. We explore motion transfer as a form of data augmentation to introduce motion variation while preserving physiological changes. We adapt a neural video synthesis approach to augment videos for the task of remote photoplethysmography (PPG) and study the effects of motion augmentation with respect to 1) the magnitude and 2) the type of motion. After training on motion-augmented versions of publicly available datasets, the presented inter-dataset results on five benchmark datasets show improvements of up to 75% over existing state-of-the-art results. Our findings illustrate the utility of motion transfer as a data augmentation technique for improving the generalization of models for camera-based physiological sensing. We release our code and pre-trained models for using motion transfer as a data augmentation technique on our project page: https://motion-matters.github.io/

* 16 pages, 6 figures, 14 tables

Via

Access Paper or Ask Questions