Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Byeong-Cheol Jo

DAGAM: Data Augmentation with Generation And Modification

Apr 06, 2022

Byeong-Cheol Jo, Tak-Sung Heo, Yeongjoon Park, Yongmin Yoo, Won Ik Cho, Kyungsun Kim

Figure 1 for DAGAM: Data Augmentation with Generation And Modification

Figure 2 for DAGAM: Data Augmentation with Generation And Modification

Figure 3 for DAGAM: Data Augmentation with Generation And Modification

Figure 4 for DAGAM: Data Augmentation with Generation And Modification

Abstract:Text classification is a representative downstream task of natural language processing, and has exhibited excellent performance since the advent of pre-trained language models based on Transformer architecture. However, in pre-trained language models, under-fitting often occurs due to the size of the model being very large compared to the amount of available training data. Along with significant importance of data collection in modern machine learning paradigm, studies have been actively conducted for natural language data augmentation. In light of this, we introduce three data augmentation schemes that help reduce underfitting problems of large-scale language models. Primarily we use a generation model for data augmentation, which is defined as Data Augmentation with Generation (DAG). Next, we augment data using text modification techniques such as corruption and word order change (Data Augmentation with Modification, DAM). Finally, we propose Data Augmentation with Generation And Modification (DAGAM), which combines DAG and DAM techniques for a boosted performance. We conduct data augmentation for six benchmark datasets of text classification task, and verify the usefulness of DAG, DAM, and DAGAM through BERT-based fine-tuning and evaluation, deriving better results compared to the performance with original datasets.

Via

Access Paper or Ask Questions

Medical Code Prediction from Discharge Summary: Document to Sequence BERT using Sequence Attention

Jul 05, 2021

Tak-Sung Heo, Yongmin Yoo, Yeongjoon Park, Byeong-Cheol Jo, Kyungsun Kim

Figure 1 for Medical Code Prediction from Discharge Summary: Document to Sequence BERT using Sequence Attention

Figure 2 for Medical Code Prediction from Discharge Summary: Document to Sequence BERT using Sequence Attention

Figure 3 for Medical Code Prediction from Discharge Summary: Document to Sequence BERT using Sequence Attention

Figure 4 for Medical Code Prediction from Discharge Summary: Document to Sequence BERT using Sequence Attention

Abstract:Clinical notes are unstructured text generated by clinicians during patient encounters. Clinical notes are usually accompanied by a set of metadata codes from the International Classification of Diseases(ICD). ICD code is an important code used in various operations, including insurance, reimbursement, medical diagnosis, etc. Therefore, it is important to classify ICD codes quickly and accurately. However, annotating these codes is costly and time-consuming. So we propose a model based on bidirectional encoder representations from transformers (BERT) using the sequence attention method for automatic ICD code assignment. We evaluate our approach on the medical information mart for intensive care III (MIMIC-III) benchmark dataset. Our model achieved performance of macro-averaged F1: 0.62898 and micro-averaged F1: 0.68555 and is performing better than a performance of the state-of-the-art model using the MIMIC-III dataset. The contribution of this study proposes a method of using BERT that can be applied to documents and a sequence attention method that can capture important sequence in-formation appearing in documents.

Via

Access Paper or Ask Questions