Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

Mar 01, 2024

Vithursan Thangarasa, Mahmoud Salem, Shreyas Saxena, Kevin Leong, Joel Hestness, Sean Lie

Figure 1 for MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

Figure 2 for MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

Figure 3 for MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

Figure 4 for MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

Share this with someone who'll enjoy it:

Abstract:Large language models (LLMs) are typically trained on general source data for various domains, but a recent surge in domain-specific LLMs has shown their potential to outperform general-purpose models in domain-specific tasks (e.g., biomedicine). Although domain-specific pre-training enhances efficiency and leads to smaller models, the computational costs of training these LLMs remain high, posing budgeting challenges. We introduce MediSwift, a suite of biomedical LMs that leverage sparse pre-training on domain-specific biomedical text data. By inducing up to 75% weight sparsity during the pre-training phase, MediSwift achieves a 2-2.5x reduction in training FLOPs. Notably, all sparse pre-training was performed on the Cerebras CS-2 system, which is specifically designed to realize the acceleration benefits from unstructured weight sparsity, thereby significantly enhancing the efficiency of the MediSwift models. Through subsequent dense fine-tuning and strategic soft prompting, MediSwift models outperform existing LLMs up to 7B parameters on biomedical tasks, setting new benchmarks w.r.t efficiency-accuracy on tasks such as PubMedQA. Our results show that sparse pre-training, along with dense fine-tuning and soft prompting, offers an effective method for creating high-performing, computationally efficient models in specialized domains.

View paper on

Share this with someone who'll enjoy it:

Title:MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

Paper and Code