Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chun Peng

A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks

Oct 10, 2023

Israt Jahan, Md Tahmid Rahman Laskar, Chun Peng, Jimmy Huang

Figure 1 for A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks

Figure 2 for A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks

Figure 3 for A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks

Figure 4 for A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks

Abstract:Recently, Large Language Models (LLM) have demonstrated impressive capability to solve a wide range of tasks. However, despite their success across various tasks, no prior work has investigated their capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of LLMs on benchmark biomedical tasks. For this purpose, we conduct a comprehensive evaluation of 4 popular LLMs in 6 diverse biomedical tasks across 26 datasets. To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot LLMs even outperform the current state-of-the-art fine-tuned biomedical models. This suggests that pretraining on large text corpora makes LLMs quite specialized even in the biomedical domain. We also find that not a single LLM can outperform other LLMs in all tasks, with the performance of different LLMs may vary depending on the task. While their performance is still quite poor in comparison to the biomedical models that were fine-tuned on large training sets, our findings demonstrate that LLMs have the potential to be a valuable tool for various biomedical tasks that lack large annotated data.

* Extended version of the following BioNLP paper: https://aclanthology.org/2023.bionlp-1.30/ (arXiv:2306.04504). arXiv admin note: substantial text overlap with arXiv:2306.04504

Via

Access Paper or Ask Questions

Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers

Jun 07, 2023

Israt Jahan, Md Tahmid Rahman Laskar, Chun Peng, Jimmy Huang

Figure 1 for Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers

Figure 2 for Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers

Figure 3 for Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers

Figure 4 for Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers

Abstract:ChatGPT is a large language model developed by OpenAI. Despite its impressive performance across various tasks, no prior work has investigated its capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of ChatGPT on various benchmark biomedical tasks, such as relation extraction, document classification, question answering, and summarization. To the best of our knowledge, this is the first work that conducts an extensive evaluation of ChatGPT in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot ChatGPT even outperforms the state-of-the-art fine-tuned generative transformer models, such as BioGPT and BioBART. This suggests that ChatGPT's pre-training on large text corpora makes it quite specialized even in the biomedical domain. Our findings demonstrate that ChatGPT has the potential to be a valuable tool for various tasks in the biomedical domain that lack large annotated data.

* Accepted by BioNLP@ACL 2023

Via

Access Paper or Ask Questions