Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Linh-Bao Vo

Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage Span Labeling

Dec 17, 2021

Duc-Vu Nguyen, Linh-Bao Vo, Ngoc-Linh Tran, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Figure 1 for Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage Span Labeling

Figure 2 for Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage Span Labeling

Figure 3 for Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage Span Labeling

Figure 4 for Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage Span Labeling

Abstract:Chinese word segmentation and part-of-speech tagging are necessary tasks in terms of computational linguistics and application of natural language processing. Many re-searchers still debate the demand for Chinese word segmentation and part-of-speech tagging in the deep learning era. Nevertheless, resolving ambiguities and detecting unknown words are challenging problems in this field. Previous studies on joint Chinese word segmentation and part-of-speech tagging mainly follow the character-based tagging model focusing on modeling n-gram features. Unlike previous works, we propose a neural model named SpanSegTag for joint Chinese word segmentation and part-of-speech tagging following the span labeling in which the probability of each n-gram being the word and the part-of-speech tag is the main problem. We use the biaffine operation over the left and right boundary representations of consecutive characters to model the n-grams. Our experiments show that our BERT-based model SpanSegTag achieved competitive performances on the CTB5, CTB6, and UD, or significant improvements on CTB7 and CTB9 benchmark datasets compared with the current state-of-the-art method using BERT or ZEN encoders.

* In Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation (PACLIC 2021)

Via

Access Paper or Ask Questions

Span Labeling Approach for Vietnamese and Chinese Word Segmentation

Oct 01, 2021

Duc-Vu Nguyen, Linh-Bao Vo, Dang Van Thin, Ngan Luu-Thuy Nguyen

Figure 1 for Span Labeling Approach for Vietnamese and Chinese Word Segmentation

Figure 2 for Span Labeling Approach for Vietnamese and Chinese Word Segmentation

Figure 3 for Span Labeling Approach for Vietnamese and Chinese Word Segmentation

Figure 4 for Span Labeling Approach for Vietnamese and Chinese Word Segmentation

Abstract:In this paper, we propose a span labeling approach to model n-gram information for Vietnamese word segmentation, namely SPAN SEG. We compare the span labeling approach with the conditional random field by using encoders with the same architecture. Since Vietnamese and Chinese have similar linguistic phenomena, we evaluated the proposed method on the Vietnamese treebank benchmark dataset and five Chinese benchmark datasets. Through our experimental results, the proposed approach SpanSeg achieves higher performance than the sequence tagging approach with the state-of-the-art F-score of 98.31% on the Vietnamese treebank benchmark, when they both apply the contextual pre-trained language model XLM-RoBERTa and the predicted word boundary information. Besides, we do fine-tuning experiments for the span labeling approach on BERT and ZEN pre-trained language model for Chinese with fewer parameters, faster inference time, and competitive or higher F-scores than the previous state-of-the-art approach, word segmentation with word-hood memory networks, on five Chinese benchmarks.

* In Proceedings of the 18th Pacific Rim International Conference on Artificial Intelligence (PRICAI 2021)

Via

Access Paper or Ask Questions