Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mengyi Gao

Hierarchical Character Tagger for Short Text Spelling Error Correction

Sep 29, 2021

Mengyi Gao, Canran Xu, Peng Shi

Figure 1 for Hierarchical Character Tagger for Short Text Spelling Error Correction

Figure 2 for Hierarchical Character Tagger for Short Text Spelling Error Correction

Figure 3 for Hierarchical Character Tagger for Short Text Spelling Error Correction

Figure 4 for Hierarchical Character Tagger for Short Text Spelling Error Correction

Abstract:State-of-the-art approaches to spelling error correction problem include Transformer-based Seq2Seq models, which require large training sets and suffer from slow inference time; and sequence labeling models based on Transformer encoders like BERT, which involve token-level label space and therefore a large pre-defined vocabulary dictionary. In this paper we present a Hierarchical Character Tagger model, or HCTagger, for short text spelling error correction. We use a pre-trained language model at the character level as a text encoder, and then predict character-level edits to transform the original text into its error-free form with a much smaller label space. For decoding, we propose a hierarchical multi-task approach to alleviate the issue of long-tail label distribution without introducing extra model parameters. Experiments on two public misspelling correction datasets demonstrate that HCTagger is an accurate and much faster approach than many existing models.

* To appear in WNUT 2021 workshop, 8 pages, 2 figures

Via

Access Paper or Ask Questions