Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arie Pratama Sutiono

Syntax-driven Data Augmentation for Named Entity Recognition

Aug 15, 2022

Arie Pratama Sutiono, Gus Hahn-Powell

Figure 1 for Syntax-driven Data Augmentation for Named Entity Recognition

Figure 2 for Syntax-driven Data Augmentation for Named Entity Recognition

Figure 3 for Syntax-driven Data Augmentation for Named Entity Recognition

Abstract:In low resource settings, data augmentation strategies are commonly leveraged to improve performance. Numerous approaches have attempted document-level augmentation (e.g., text classification), but few studies have explored token-level augmentation. Performed naively, data augmentation can produce semantically incongruent and ungrammatical examples. In this work, we compare simple masked language model replacement and an augmentation method using constituency tree mutations to improve the performance of named entity recognition in low-resource settings with the aim of preserving linguistic cohesion of the augmented sentences.

* submitted to Pattern-based Approaches to NLP in the Age of Deep Learning 2022 (Pan-DL 2022)

Via

Access Paper or Ask Questions

Aspect and Opinion Term Extraction for Aspect Based Sentiment Analysis of Hotel Reviews Using Transfer Learning

Oct 30, 2019

Ali Akbar Septiandri, Arie Pratama Sutiono

Figure 1 for Aspect and Opinion Term Extraction for Aspect Based Sentiment Analysis of Hotel Reviews Using Transfer Learning

Figure 2 for Aspect and Opinion Term Extraction for Aspect Based Sentiment Analysis of Hotel Reviews Using Transfer Learning

Figure 3 for Aspect and Opinion Term Extraction for Aspect Based Sentiment Analysis of Hotel Reviews Using Transfer Learning

Figure 4 for Aspect and Opinion Term Extraction for Aspect Based Sentiment Analysis of Hotel Reviews Using Transfer Learning

Abstract:One of the tasks in aspect-based sentiment analysis is to extract aspect and opinion terms from review text. Our study focuses on evaluating transfer learning using BERT (Devlin et al., 2019) to classify tokens from hotel reviews in bahasa Indonesia. We show that the default BERT model failed to outperform a simple argmax method. However, changing the default BERT tokenizer to our custom one can improve the F1 scores on our labels of interest by at least 5%. For I-ASPECT and B-SENTIMENT, it can even increased the F1 scores by 11%. On entity-level evaluation, our tweak on the tokenizer can achieve F1 scores of 87% and 89% for ASPECT and SENTIMENT labels respectively. These scores are only 2% away from the best model by Fernando et al. (2019), but with much less training effort (8 vs 200 epochs).

* Some mistakes in the experiment

Via

Access Paper or Ask Questions