Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tak-Sung Heo

Multi label classification of Artificial Intelligence related patents using Modified D2SBERT and Sentence Attention mechanism

Mar 03, 2023

Yongmin Yoo, Tak-Sung Heo, Dongjin Lim, Deaho Seo

Figure 1 for Multi label classification of Artificial Intelligence related patents using Modified D2SBERT and Sentence Attention mechanism

Figure 2 for Multi label classification of Artificial Intelligence related patents using Modified D2SBERT and Sentence Attention mechanism

Figure 3 for Multi label classification of Artificial Intelligence related patents using Modified D2SBERT and Sentence Attention mechanism

Figure 4 for Multi label classification of Artificial Intelligence related patents using Modified D2SBERT and Sentence Attention mechanism

Abstract:Patent classification is an essential task in patent information management and patent knowledge mining. It is very important to classify patents related to artificial intelligence, which is the biggest topic these days. However, artificial intelligence-related patents are very difficult to classify because it is a mixture of complex technologies and legal terms. Moreover, due to the unsatisfactory performance of current algorithms, it is still mostly done manually, wasting a lot of time and money. Therefore, we present a method for classifying artificial intelligence-related patents published by the USPTO using natural language processing technique and deep learning methodology. We use deformed BERT and sentence attention overcome the limitations of BERT. Our experiment result is highest performance compared to other deep learning methods.

Via

Access Paper or Ask Questions

DAGAM: Data Augmentation with Generation And Modification

Apr 06, 2022

Byeong-Cheol Jo, Tak-Sung Heo, Yeongjoon Park, Yongmin Yoo, Won Ik Cho, Kyungsun Kim

Figure 1 for DAGAM: Data Augmentation with Generation And Modification

Figure 2 for DAGAM: Data Augmentation with Generation And Modification

Figure 3 for DAGAM: Data Augmentation with Generation And Modification

Figure 4 for DAGAM: Data Augmentation with Generation And Modification

Abstract:Text classification is a representative downstream task of natural language processing, and has exhibited excellent performance since the advent of pre-trained language models based on Transformer architecture. However, in pre-trained language models, under-fitting often occurs due to the size of the model being very large compared to the amount of available training data. Along with significant importance of data collection in modern machine learning paradigm, studies have been actively conducted for natural language data augmentation. In light of this, we introduce three data augmentation schemes that help reduce underfitting problems of large-scale language models. Primarily we use a generation model for data augmentation, which is defined as Data Augmentation with Generation (DAG). Next, we augment data using text modification techniques such as corruption and word order change (Data Augmentation with Modification, DAM). Finally, we propose Data Augmentation with Generation And Modification (DAGAM), which combines DAG and DAM techniques for a boosted performance. We conduct data augmentation for six benchmark datasets of text classification task, and verify the usefulness of DAG, DAM, and DAGAM through BERT-based fine-tuning and evaluation, deriving better results compared to the performance with original datasets.

Via

Access Paper or Ask Questions

Solar cell patent classification method based on keyword extraction and deep neural network

Sep 18, 2021

Yongmin Yoo, Dongjin Lim, Tak-Sung Heo

Figure 1 for Solar cell patent classification method based on keyword extraction and deep neural network

Figure 2 for Solar cell patent classification method based on keyword extraction and deep neural network

Figure 3 for Solar cell patent classification method based on keyword extraction and deep neural network

Figure 4 for Solar cell patent classification method based on keyword extraction and deep neural network

Abstract:With the growing impact of ESG on businesses, research related to renewable energy is receiving great attention. Solar cells are one of them, and accordingly, it can be said that the research value of solar cell patent analysis is very high. Patent documents have high research value. Being able to accurately analyze and classify patent documents can reveal several important technical relationships. It can also describe the business trends in that technology. And when it comes to investment, new industrial solutions will also be inspired and proposed to make important decisions. Therefore, we must carefully analyze patent documents and utilize the value of patents. To solve the solar cell patent classification problem, we propose a keyword extraction method and a deep neural network-based solar cell patent classification method. First, solar cell patents are analyzed for pretreatment. It then uses the KeyBERT algorithm to extract keywords and key phrases from the patent abstract to construct a lexical dictionary. We then build a solar cell patent classification model according to the deep neural network. Finally, we use a deep neural network-based solar cell patent classification model to classify power patents, and the training accuracy is greater than 95%. Also, the validation accuracy is about 87.5%. It can be seen that the deep neural network method can not only realize the classification of complex and difficult solar cell patents, but also have a good classification effect.

Via

Access Paper or Ask Questions

Medical Code Prediction from Discharge Summary: Document to Sequence BERT using Sequence Attention

Jul 05, 2021

Tak-Sung Heo, Yongmin Yoo, Yeongjoon Park, Byeong-Cheol Jo, Kyungsun Kim

Figure 1 for Medical Code Prediction from Discharge Summary: Document to Sequence BERT using Sequence Attention

Figure 2 for Medical Code Prediction from Discharge Summary: Document to Sequence BERT using Sequence Attention

Figure 3 for Medical Code Prediction from Discharge Summary: Document to Sequence BERT using Sequence Attention

Figure 4 for Medical Code Prediction from Discharge Summary: Document to Sequence BERT using Sequence Attention

Abstract:Clinical notes are unstructured text generated by clinicians during patient encounters. Clinical notes are usually accompanied by a set of metadata codes from the International Classification of Diseases(ICD). ICD code is an important code used in various operations, including insurance, reimbursement, medical diagnosis, etc. Therefore, it is important to classify ICD codes quickly and accurately. However, annotating these codes is costly and time-consuming. So we propose a model based on bidirectional encoder representations from transformers (BERT) using the sequence attention method for automatic ICD code assignment. We evaluate our approach on the medical information mart for intensive care III (MIMIC-III) benchmark dataset. Our model achieved performance of macro-averaged F1: 0.62898 and micro-averaged F1: 0.68555 and is performing better than a performance of the state-of-the-art model using the MIMIC-III dataset. The contribution of this study proposes a method of using BERT that can be applied to documents and a sequence attention method that can capture important sequence in-formation appearing in documents.

Via

Access Paper or Ask Questions

A novel hybrid methodology of measuring sentence similarity

May 20, 2021

Yongmin Yoo, Tak-Sung Heo, Yeongjoon Park

Figure 1 for A novel hybrid methodology of measuring sentence similarity

Figure 2 for A novel hybrid methodology of measuring sentence similarity

Figure 3 for A novel hybrid methodology of measuring sentence similarity

Abstract:The problem of measuring sentence similarity is an essential issue in the natural language processing (NLP) area. It is necessary to measure the similarity between sentences accurately. There are many approaches to measuring sentence similarity. Deep learning methodology shows a state-of-the-art performance in many natural language processing fields and is used a lot in sentence similarity measurement methods. However, in the natural language processing field, considering the structure of the sentence or the word structure that makes up the sentence is also important. In this study, we propose a methodology combined with both deep learning methodology and a method considering lexical relationships. Our evaluation metric is the Pearson correlation coefficient and Spearman correlation coefficient. As a result, the proposed method outperforms the current approaches on a KorSTS standard benchmark Korean dataset. Moreover, it performs a maximum of 65% increase than only using deep learning methodology. Experiments show that our proposed method generally results in better performance than those with only a deep learning model.

Via

Access Paper or Ask Questions