Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP

Aug 30, 2022

Johann Frei, Frank Kramer

Figure 1 for Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP

Figure 2 for Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP

Figure 3 for Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP

Figure 4 for Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP

Share this with someone who'll enjoy it:

Abstract:Obtaining text datasets with semantic annotations is an effortful process, yet crucial for supervised training in natural language processsing (NLP). In general, developing and applying new NLP pipelines in domain-specific contexts for tasks often requires custom designed datasets to address NLP tasks in supervised machine learning fashion. When operating in non-English languages for medical data processing, this exposes several minor and major, interconnected problems such as lack of task-matching datasets as well as task-specific pre-trained models. In our work we suggest to leverage pretrained language models for training data acquisition in order to retrieve sufficiently large datasets for training smaller and more efficient models for use-case specific tasks. To demonstrate the effectiveness of your approach, we create a custom dataset which we use to train a medical NER model for German texts, GPTNERMED, yet our method remains language-independent in principle. Our obtained dataset as well as our pre-trained models are publicly available at: https://github.com/frankkramer-lab/GPTNERMED

View paper on

Share this with someone who'll enjoy it:

Title:Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP

Paper and Code