Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuexing Hao

MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering

May 29, 2025

Yuexing Hao, Kumail Alhamoud, Hyewon Jeong, Haoran Zhang, Isha Puri, Philip Torr, Mike Schaekermann, Ariel D. Stern, Marzyeh Ghassemi

Abstract:Large Language Models (LLMs) have demonstrated remarkable performance on various medical question-answering (QA) benchmarks, including standardized medical exams. However, correct answers alone do not ensure correct logic, and models may reach accurate conclusions through flawed processes. In this study, we introduce the MedPAIR (Medical Dataset Comparing Physicians and AI Relevance Estimation and Question Answering) dataset to evaluate how physician trainees and LLMs prioritize relevant information when answering QA questions. We obtain annotations on 1,300 QA pairs from 36 physician trainees, labeling each sentence within the question components for relevance. We compare these relevance estimates to those for LLMs, and further evaluate the impact of these "relevant" subsets on downstream task performance for both physician trainees and LLMs. We find that LLMs are frequently not aligned with the content relevance estimates of physician trainees. After filtering out physician trainee-labeled irrelevant sentences, accuracy improves for both the trainees and the LLMs. All LLM and physician trainee-labeled data are available at: http://medpair.csail.mit.edu/.

Via

Access Paper or Ask Questions

MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models

May 16, 2025

Xiaomin Li, Mingye Gao, Yuexing Hao, Taoran Li, Guangya Wan, Zihan Wang, Yijun Wang

Abstract:Clinical guidelines, typically structured as decision trees, are central to evidence-based medical practice and critical for ensuring safe and accurate diagnostic decision-making. However, it remains unclear whether Large Language Models (LLMs) can reliably follow such structured protocols. In this work, we introduce MedGUIDE, a new benchmark for evaluating LLMs on their ability to make guideline-consistent clinical decisions. MedGUIDE is constructed from 55 curated NCCN decision trees across 17 cancer types and uses clinical scenarios generated by LLMs to create a large pool of multiple-choice diagnostic questions. We apply a two-stage quality selection process, combining expert-labeled reward models and LLM-as-a-judge ensembles across ten clinical and linguistic criteria, to select 7,747 high-quality samples. We evaluate 25 LLMs spanning general-purpose, open-source, and medically specialized models, and find that even domain-specific LLMs often underperform on tasks requiring structured guideline adherence. We also test whether performance can be improved via in-context guideline inclusion or continued pretraining. Our findings underscore the importance of MedGUIDE in assessing whether LLMs can operate safely within the procedural frameworks expected in real-world clinical settings.

Via

Access Paper or Ask Questions

AI-Based Teat Shape and Skin Condition Prediction for Dairy Management

Dec 22, 2024

Yuexing Hao, Tiancheng Yuan, Yuting Yang, Aarushi Gupta, Matthias Wieland, Ken Birman, Parminder S. Basran

Abstract:Dairy owners spend significant effort to keep their animals healthy. There is good reason to hope that technologies such as computer vision and artificial intelligence (AI) could reduce these costs, yet obstacles arise when adapting advanced tools to farming environments. In this work, we adapt AI tools to dairy cow teat localization, teat shape, and teat skin condition classifications. We also curate a data collection and analysis methodology for a Machine Learning (ML) pipeline. The resulting teat shape prediction model achieves a mean Average Precision (mAP) of 0.783, and the teat skin condition model achieves a mean average precision of 0.828. Our work leverages existing ML vision models to facilitate the individualized identification of teat health and skin conditions, applying AI to the dairy management industry.

Via

Access Paper or Ask Questions

Retrospective Comparative Analysis of Prostate Cancer In-Basket Messages: Responses from Closed-Domain LLM vs. Clinical Teams

Sep 26, 2024

Yuexing Hao, Jason M. Holmes, Jared Hobson, Alexandra Bennett, Daniel K. Ebner, David M. Routman, Satomi Shiraishi, Samir H. Patel, Nathan Y. Yu, Chris L. Hallemeier(+3 more)

Abstract:In-basket message interactions play a crucial role in physician-patient communication, occurring during all phases (pre-, during, and post) of a patient's care journey. However, responding to these patients' inquiries has become a significant burden on healthcare workflows, consuming considerable time for clinical care teams. To address this, we introduce RadOnc-GPT, a specialized Large Language Model (LLM) powered by GPT-4 that has been designed with a focus on radiotherapeutic treatment of prostate cancer with advanced prompt engineering, and specifically designed to assist in generating responses. We integrated RadOnc-GPT with patient electronic health records (EHR) from both the hospital-wide EHR database and an internal, radiation-oncology-specific database. RadOnc-GPT was evaluated on 158 previously recorded in-basket message interactions. Quantitative natural language processing (NLP) analysis and two grading studies with clinicians and nurses were used to assess RadOnc-GPT's responses. Our findings indicate that RadOnc-GPT slightly outperformed the clinical care team in "Clarity" and "Empathy," while achieving comparable scores in "Completeness" and "Correctness." RadOnc-GPT is estimated to save 5.2 minutes per message for nurses and 2.4 minutes for clinicians, from reading the inquiry to sending the response. Employing RadOnc-GPT for in-basket message draft generation has the potential to alleviate the workload of clinical care teams and reduce healthcare costs by producing high-quality, timely responses.

Via

Access Paper or Ask Questions

Attention Round for Post-Training Quantization

Jul 07, 2022

Huabin Diao, Gongyan Li, Shaoyun Xu, Yuexing Hao

Figure 1 for Attention Round for Post-Training Quantization

Figure 2 for Attention Round for Post-Training Quantization

Figure 3 for Attention Round for Post-Training Quantization

Figure 4 for Attention Round for Post-Training Quantization

Abstract:At present, the quantification methods of neural network models are mainly divided into post-training quantization (PTQ) and quantization aware training (QAT). Post-training quantization only need a small part of the data to complete the quantification process, but the performance of its quantitative model is not as good as the quantization aware training. This paper presents a novel quantification method called Attention Round. This method gives parameters w the opportunity to be mapped to all possible quantized values, rather than just the two quantized values nearby w in the process of quantization. The probability of being mapped to different quantified values is negatively correlated with the distance between the quantified values and w, and decay with a Gaussian function. In addition, this paper uses the lossy coding length as a measure to assign bit widths to the different layers of the model to solve the problem of mixed precision quantization, which effectively avoids to solve combinatorial optimization problem. This paper also performs quantitative experiments on different models, the results confirm the effectiveness of the proposed method. For ResNet18 and MobileNetV2, the post-training quantization proposed in this paper only require 1,024 training data and 10 minutes to complete the quantization process, which can achieve quantization performance on par with quantization aware training.

* 18 pages, 5 figures, 5 tables

Via

Access Paper or Ask Questions

REST: Debiased Social Recommendation via Reconstructing Exposure Strategies

Jan 13, 2022

Ruichu Cai, Fengzhu Wu, Zijian Li, Jie Qiao, Wei Chen, Yuexing Hao, Hao Gu

Figure 1 for REST: Debiased Social Recommendation via Reconstructing Exposure Strategies

Figure 2 for REST: Debiased Social Recommendation via Reconstructing Exposure Strategies

Figure 3 for REST: Debiased Social Recommendation via Reconstructing Exposure Strategies

Figure 4 for REST: Debiased Social Recommendation via Reconstructing Exposure Strategies

Abstract:The recommendation system, relying on historical observational data to model the complex relationships among the users and items, has achieved great success in real-world applications. Selection bias is one of the most important issues of the existing observational data based approaches, which is actually caused by multiple types of unobserved exposure strategies (e.g. promotions and holiday effects). Though various methods have been proposed to address this problem, they are mainly relying on the implicit debiasing techniques but not explicitly modeling the unobserved exposure strategies. By explicitly Reconstructing Exposure STrategies (REST in short), we formalize the recommendation problem as the counterfactual reasoning and propose the debiased social recommendation method. In REST, we assume that the exposure of an item is controlled by the latent exposure strategies, the user, and the item. Based on the above generation process, we first provide the theoretical guarantee of our method via identification analysis. Second, we employ a variational auto-encoder to reconstruct the latent exposure strategies, with the help of the social networks and the items. Third, we devise a counterfactual reasoning based recommendation algorithm by leveraging the recovered exposure strategies. Experiments on four real-world datasets, including three published datasets and one private WeChat Official Account dataset, demonstrate significant improvements over several state-of-the-art methods.

Via

Access Paper or Ask Questions

TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations

Nov 14, 2021

Zijian Li, Ruichu Cai, Fengzhu Wu, Sili Zhang, Hao Gu, Yuexing Hao, Yuguang

Figure 1 for TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations

Figure 2 for TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations

Figure 3 for TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations

Figure 4 for TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations

Abstract:Sequential recommendation aims to choose the most suitable items for a user at a specific timestamp given historical behaviors. Existing methods usually model the user behavior sequence based on the transition-based methods like Markov Chain. However, these methods also implicitly assume that the users are independent of each other without considering the influence between users. In fact, this influence plays an important role in sequence recommendation since the behavior of a user is easily affected by others. Therefore, it is desirable to aggregate both user behaviors and the influence between users, which are evolved temporally and involved in the heterogeneous graph of users and items. In this paper, we incorporate dynamic user-item heterogeneous graphs to propose a novel sequential recommendation framework. As a result, the historical behaviors as well as the influence between users can be taken into consideration. To achieve this, we firstly formalize sequential recommendation as a problem to estimate conditional probability given temporal dynamic heterogeneous graphs and user behavior sequences. After that, we exploit the conditional random field to aggregate the heterogeneous graphs and user behaviors for probability estimation, and employ the pseudo-likelihood approach to derive a tractable objective function. Finally, we provide scalable and flexible implementations of the proposed framework. Experimental results on three real-world datasets not only demonstrate the effectiveness of our proposed method but also provide some insightful discoveries on sequential recommendation.

Via

Access Paper or Ask Questions

TAG : Type Auxiliary Guiding for Code Comment Generation

May 06, 2020

Ruichu Cai, Zhihao Liang, Boyan Xu, Zijian Li, Yuexing Hao, Yao Chen

Figure 1 for TAG : Type Auxiliary Guiding for Code Comment Generation

Figure 2 for TAG : Type Auxiliary Guiding for Code Comment Generation

Figure 3 for TAG : Type Auxiliary Guiding for Code Comment Generation

Figure 4 for TAG : Type Auxiliary Guiding for Code Comment Generation

Abstract:Existing leading code comment generation approaches with the structure-to-sequence framework ignores the type information of the interpretation of the code, e.g., operator, string, etc. However, introducing the type information into the existing framework is non-trivial due to the hierarchical dependence among the type information. In order to address the issues above, we propose a Type Auxiliary Guiding encoder-decoder framework for the code comment generation task which considers the source code as an N-ary tree with type information associated with each node. Specifically, our framework is featured with a Type-associated Encoder and a Type-restricted Decoder which enables adaptive summarization of the source code. We further propose a hierarchical reinforcement learning method to resolve the training difficulties of our proposed framework. Extensive evaluations demonstrate the state-of-the-art performance of our framework with both the auto-evaluated metrics and case studies.

* ACL 2020, Accepted

Via

Access Paper or Ask Questions

Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme

Jun 07, 2017

Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing Hao, Peng Zhou, Bo Xu

Figure 1 for Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme

Figure 2 for Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme

Figure 3 for Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme

Figure 4 for Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme

Abstract:Joint extraction of entities and relations is an important task in information extraction. To tackle this problem, we firstly propose a novel tagging scheme that can convert the joint extraction task to a tagging problem. Then, based on our tagging scheme, we study different end-to-end models to extract entities and their relations directly, without identifying entities and relations separately. We conduct experiments on a public dataset produced by distant supervision method and the experimental results show that the tagging based methods are better than most of the existing pipelined and joint learning methods. What's more, the end-to-end model proposed in this paper, achieves the best results on the public dataset.

Via

Access Paper or Ask Questions