Abstract:At present, the deep end-to-end method based on supervised learning is used in entity recognition and dependency analysis. There are two problems in this method: firstly, background knowledge cannot be introduced; secondly, multi granularity and nested features of natural language cannot be recognized. In order to solve these problems, the annotation rules based on phrase window are proposed, and the corresponding multi-dimensional end-to-end phrase recognition algorithm is designed. This annotation rule divides sentences into seven types of nested phrases, and indicates the dependency between phrases. The algorithm can not only introduce background knowledge, recognize all kinds of nested phrases in sentences, but also recognize the dependency between phrases. The experimental results show that the annotation rule is easy to use and has no ambiguity; the matching algorithm is more consistent with the multi granularity and diversity characteristics of syntax than the traditional end-to-end algorithm. The experiment on CPWD dataset, by introducing background knowledge, the new algorithm improves the accuracy of the end-to-end method by more than one point. The corresponding method was applied to the CCL 2018 competition and won the first place in the task of Chinese humor type recognition.
Abstract:At present, most Natural Language Processing technology is based on the results of Word Segmentation for Dependency Parsing, which mainly uses an end-to-end method based on supervised learning. There are two main problems with this method: firstly, the la-beling rules are complex and the data is too difficult to label, the workload of which is large; secondly, the algorithm cannot recognize the multi-granularity and diversity of language components. In order to solve these two problems, we propose labeling rules based on phrase windows, and designed corresponding phrase recognition algorithms. The labeling rule uses phrases as the minimum unit, di-vides sentences into 7 types of nestable phrase types, and marks the grammatical dependencies between phrases. The corresponding algorithm, drawing on the idea of identifying the target area in the image field, can find the start and end positions of various phrases in the sentence, and realize the synchronous recognition of nested phrases and grammatical dependencies. The results of the experiment shows that the labeling rule is convenient and easy to use, and there is no ambiguity; the algorithm is more grammatically multi-granular and diverse than the end-to-end algorithm. Experiments on the CPWD dataset improve the accuracy of the end-to-end method by about 1 point. The corresponding method was applied to the CCL2018 competition, and the first place in the Chinese Metaphor Sentiment Analysis Task.