Abstract:This paper analyses how traditional baseline metrics, such as BLEU and TER, and neural-based methods, such as BERTScore and COMET, score several NMT models performance on chat translation and how these metrics perform when compared to human-annotated scores. The results show that for ranking NMT models in chat translations, all metrics seem consistent in deciding which model outperforms the others. This implies that traditional baseline metrics, which are faster and simpler to use, can still be helpful. On the other hand, when it comes to better correlation with human judgment, neural-based metrics outperform traditional metrics, with COMET achieving the highest correlation with the human-annotated score on a chat translation. However, we show that even the best metric struggles when scoring English translations from sentences with anaphoric zero-pronoun in Japanese.
Abstract:This research explores the applicability of cross-lingual transfer learning from English to Japanese and Indonesian using the XLM-R pre-trained model. The results are compared with several previous works, either by models using a similar zero-shot approach or a fully-supervised approach, to provide an overview of the zero-shot transfer learning approach's capability using XLM-R in comparison with existing models. Our models achieve the best result in one Japanese dataset and comparable results in other datasets in Japanese and Indonesian languages without being trained using the target language. Furthermore, the results suggest that it is possible to train a multi-lingual model, instead of one model for each language, and achieve promising results.
Abstract:This study investigates the performance of three popular tokenization tools: MeCab, Sudachi, and SentencePiece, when applied as a preprocessing step for sentiment-based text classification of Japanese texts. Using Term Frequency-Inverse Document Frequency (TF-IDF) vectorization, we evaluate two traditional machine learning classifiers: Multinomial Naive Bayes and Logistic Regression. The results reveal that Sudachi produces tokens closely aligned with dictionary definitions, while MeCab and SentencePiece demonstrate faster processing speeds. The combination of SentencePiece, TF-IDF, and Logistic Regression outperforms the other alternatives in terms of classification performance.