Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Huatuo-26M, a Large-scale Chinese Medical QA Dataset

May 02, 2023

Jianquan Li, Xidong Wang, Xiangbo Wu, Zhiyi Zhang, Xiaolong Xu, Jie Fu, Prayag Tiwari, Xiang Wan, Benyou Wang

Figure 1 for Huatuo-26M, a Large-scale Chinese Medical QA Dataset

Figure 2 for Huatuo-26M, a Large-scale Chinese Medical QA Dataset

Figure 3 for Huatuo-26M, a Large-scale Chinese Medical QA Dataset

Figure 4 for Huatuo-26M, a Large-scale Chinese Medical QA Dataset

Share this with someone who'll enjoy it:

Abstract:In this paper, we release a largest ever medical Question Answering (QA) dataset with 26 million QA pairs. We benchmark many existing approaches in our dataset in terms of both retrieval and generation. Experimental results show that the existing models perform far lower than expected and the released dataset is still challenging in the pre-trained language model era. Moreover, we also experimentally show the benefit of the proposed dataset in many aspects: (i) trained models for other QA datasets in a zero-shot fashion; and (ii) as external knowledge for retrieval-augmented generation (RAG); and (iii) improving existing pre-trained language models by using the QA pairs as a pre-training corpus in continued training manner. We believe that this dataset will not only contribute to medical research but also facilitate both the patients and clinical doctors. See \url{https://github.com/FreedomIntelligence/Huatuo-26M}.

View paper on

Share this with someone who'll enjoy it:

Title:Huatuo-26M, a Large-scale Chinese Medical QA Dataset

Paper and Code