Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Masayu Leylia Khodra

IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

Apr 16, 2021

Samuel Cahyawijaya, Genta Indra Winata, Bryan Wilie, Karissa Vincentio, Xiaohong Li, Adhiguna Kuncoro, Sebastian Ruder, Zhi Yuan Lim, Syafri Bahar, Masayu Leylia Khodra(+2 more)

Figure 1 for IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

Figure 2 for IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

Figure 3 for IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

Figure 4 for IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

Abstract:A benchmark provides an ecosystem to measure the advancement of models with standard datasets and automatic and human evaluation metrics. We introduce IndoNLG, the first such benchmark for the Indonesian language for natural language generation (NLG). It covers six tasks: summarization, question answering, open chitchat, as well as three different language-pairs of machine translation tasks. We provide a vast and clean pre-training corpus of Indonesian, Sundanese, and Javanese datasets called Indo4B-Plus, which is used to train our pre-trained NLG model, IndoBART. We evaluate the effectiveness and efficiency of IndoBART by conducting extensive evaluation on all IndoNLG tasks. Our findings show that IndoBART achieves competitive performance on Indonesian tasks with five times fewer parameters compared to the largest multilingual model in our benchmark, mBART-LARGE (Liu et al., 2020), and an almost 4x and 2.5x faster inference time on the CPU and GPU respectively. We additionally demonstrate the ability of IndoBART to learn Javanese and Sundanese, and it achieves decent performance on machine translation tasks.

* Work in progress, 10 pages

Via

Access Paper or Ask Questions

Multi-document Summarization using Semantic Role Labeling and Semantic Graph for Indonesian News Article

Mar 05, 2021

Yuly Haruka Berliana Gunawan, Masayu Leylia Khodra

Figure 1 for Multi-document Summarization using Semantic Role Labeling and Semantic Graph for Indonesian News Article

Figure 2 for Multi-document Summarization using Semantic Role Labeling and Semantic Graph for Indonesian News Article

Figure 3 for Multi-document Summarization using Semantic Role Labeling and Semantic Graph for Indonesian News Article

Figure 4 for Multi-document Summarization using Semantic Role Labeling and Semantic Graph for Indonesian News Article

Abstract:In this paper, we proposed a multi-document summarization system using semantic role labeling (SRL) and semantic graph for Indonesian news articles. In order to improve existing summarizer, our system modified summarizer that employed subject, predicate, object, and adverbial (SVOA) extraction for predicate argument structure (PAS) extraction. SVOA extraction is replaced with SRL model for Indonesian. We also replace the genetic algorithm to identify important PAS with the decision tree classifier since the summarizer without genetic algorithm gave better performance. The decision tree model is employed to identify important PAS. The decision tree model with 10 features achieved better performance than decision tree with 4 sentence features. Experiments and evaluations are conducted to generate 100 words summary and 200 words summary. The evaluation shows the proposed model get 0.313 average ROUGE-2 recall in 100 words summary and 0.394 average ROUGE-2 recall in 200 words summary.

Via

Access Paper or Ask Questions

Fine-tuning Pretrained Multilingual BERT Model for Indonesian Aspect-based Sentiment Analysis

Mar 05, 2021

Annisa Nurul Azhar, Masayu Leylia Khodra

Figure 1 for Fine-tuning Pretrained Multilingual BERT Model for Indonesian Aspect-based Sentiment Analysis

Figure 2 for Fine-tuning Pretrained Multilingual BERT Model for Indonesian Aspect-based Sentiment Analysis

Figure 3 for Fine-tuning Pretrained Multilingual BERT Model for Indonesian Aspect-based Sentiment Analysis

Figure 4 for Fine-tuning Pretrained Multilingual BERT Model for Indonesian Aspect-based Sentiment Analysis

Abstract:Although previous research on Aspect-based Sentiment Analysis (ABSA) for Indonesian reviews in hotel domain has been conducted using CNN and XGBoost, its model did not generalize well in test data and high number of OOV words contributed to misclassification cases. Nowadays, most state-of-the-art results for wide array of NLP tasks are achieved by utilizing pretrained language representation. In this paper, we intend to incorporate one of the foremost language representation model, BERT, to perform ABSA in Indonesian reviews dataset. By combining multilingual BERT (m-BERT) with task transformation method, we manage to achieve significant improvement by 8% on the F1-score compared to the result from our previous study.

Via

Access Paper or Ask Questions

Parsing Indonesian Sentence into Abstract Meaning Representation using Machine Learning Approach

Mar 05, 2021

Adylan Roaffa Ilmy, Masayu Leylia Khodra

Figure 1 for Parsing Indonesian Sentence into Abstract Meaning Representation using Machine Learning Approach

Figure 2 for Parsing Indonesian Sentence into Abstract Meaning Representation using Machine Learning Approach

Figure 3 for Parsing Indonesian Sentence into Abstract Meaning Representation using Machine Learning Approach

Figure 4 for Parsing Indonesian Sentence into Abstract Meaning Representation using Machine Learning Approach

Abstract:Abstract Meaning Representation (AMR) provides many information of a sentence such as semantic relations, coreferences, and named entity relation in one representation. However, research on AMR parsing for Indonesian sentence is fairly limited. In this paper, we develop a system that aims to parse an Indonesian sentence using a machine learning approach. Based on Zhang et al. work, our system consists of three steps: pair prediction, label prediction, and graph construction. Pair prediction uses dependency parsing component to get the edges between the words for the AMR. The result of pair prediction is passed to the label prediction process which used a supervised learning algorithm to predict the label between the edges of the AMR. We used simple sentence dataset that is gathered from articles and news article sentences. Our model achieved the SMATCH score of 0.820 for simple sentence test data.

Via

Access Paper or Ask Questions

Aspect and Opinion Terms Extraction Using Double Embeddings and Attention Mechanism for Indonesian Hotel Reviews

Aug 19, 2019

Jordhy Fernando, Masayu Leylia Khodra, Ali Akbar Septiandri

Figure 1 for Aspect and Opinion Terms Extraction Using Double Embeddings and Attention Mechanism for Indonesian Hotel Reviews

Figure 2 for Aspect and Opinion Terms Extraction Using Double Embeddings and Attention Mechanism for Indonesian Hotel Reviews

Figure 3 for Aspect and Opinion Terms Extraction Using Double Embeddings and Attention Mechanism for Indonesian Hotel Reviews

Figure 4 for Aspect and Opinion Terms Extraction Using Double Embeddings and Attention Mechanism for Indonesian Hotel Reviews

Abstract:Aspect and opinion terms extraction from review texts is one of the key tasks in aspect-based sentiment analysis. In order to extract aspect and opinion terms for Indonesian hotel reviews, we adapt double embeddings feature and attention mechanism that outperform the best system at SemEval 2015 and 2016. We conduct experiments using 4000 reviews to find the best configuration and show the influences of double embeddings and attention mechanism toward model performance. Using 1000 reviews for evaluation, we achieved F1-measure of 0.914 and 0.90 for aspect and opinion terms extraction in token and entity (term) level respectively.

Via

Access Paper or Ask Questions

A Question Answering System Using Graph-Pattern Association Rules (QAGPAR) On YAGO Knowledge Base

Feb 02, 2019

Wahyudi, Masayu Leylia Khodra, Ary Setijadi Prihatmanto, Carmadi Machbub

Figure 1 for A Question Answering System Using Graph-Pattern Association Rules (QAGPAR) On YAGO Knowledge Base

Figure 2 for A Question Answering System Using Graph-Pattern Association Rules (QAGPAR) On YAGO Knowledge Base

Figure 3 for A Question Answering System Using Graph-Pattern Association Rules (QAGPAR) On YAGO Knowledge Base

Figure 4 for A Question Answering System Using Graph-Pattern Association Rules (QAGPAR) On YAGO Knowledge Base

Abstract:A question answering system (QA System) was developed that uses graph-pattern association rules on the YAGO knowledge base. The answer as output of the system is provided based on a user question as input. If the answer is missing or unavailable in the database, then graph-pattern association rules are used to get the answer. The architecture of this question answering system is as follows: question classification, graph component generation, query generation, and query processing. The question answering system uses association graph patterns in a waterfall model. In this paper, the architecture of the system is described, specifically discussing its reasoning and performance capabilities. The results of this research is that rules with high confidence and correct logic produce correct answers, and vice versa

Via

Access Paper or Ask Questions

Handling Imbalanced Dataset in Multi-label Text Categorization using Bagging and Adaptive Boosting

Oct 27, 2018

Genta Indra Winata, Masayu Leylia Khodra

Figure 1 for Handling Imbalanced Dataset in Multi-label Text Categorization using Bagging and Adaptive Boosting

Figure 2 for Handling Imbalanced Dataset in Multi-label Text Categorization using Bagging and Adaptive Boosting

Figure 3 for Handling Imbalanced Dataset in Multi-label Text Categorization using Bagging and Adaptive Boosting

Figure 4 for Handling Imbalanced Dataset in Multi-label Text Categorization using Bagging and Adaptive Boosting

Abstract:Imbalanced dataset is occurred due to uneven distribution of data available in the real world such as disposition of complaints on government offices in Bandung. Consequently, multi-label text categorization algorithms may not produce the best performance because classifiers tend to be weighed down by the majority of the data and ignore the minority. In this paper, Bagging and Adaptive Boosting algorithms are employed to handle the issue and improve the performance of text categorization. The result is evaluated with four evaluation metrics such as hamming loss, subset accuracy, example-based accuracy and micro-averaged f-measure. Bagging.ML-LP with SMO weak classifier is the best performer in terms of subset accuracy and example-based accuracy. Bagging.ML-BR with SMO weak classifier has the best micro-averaged f-measure among all. In other hand, AdaBoost.MH with J48 weak classifier has the lowest hamming loss value. Thus, both algorithms have high potential in boosting the performance of text categorization, but only for certain weak classifiers. However, bagging has more potential than adaptive boosting in increasing the accuracy of minority labels.

* Accepted in the 2015 International Conference on Electrical Engineering and Informatics (ICEEI), Best Student Paper Award

Via

Access Paper or Ask Questions