Tsinghua University, Beijing
Abstract:This paper reviews the development of Chinese word segmentation (CWS) in the most recent decade, 2007-2017. Special attention was paid to the deep learning technologies that has already permeated into most areas of natural language processing (NLP). The basic view we have arrived at is that compared to traditional supervised learning methods, neural network based methods have not shown any superior performance. The most critical challenge still lies on balancing of recognition of in-vocabulary (IV) and out-of-vocabulary (OOV) words. However, as neural models have potentials to capture the essential linguistic structure of natural language, we are optimistic about significant progresses may arrive in the near future.
Abstract:Dependency Grammar has been used by linguists as the basis of the syntactic components of their grammar formalisms. It has also been used in natural language parsing. In China, attempts have been made to use this grammar formalism to parse Chinese sentences using corpus-based techniques. This paper reviews the properties of Dependency Grammar as embodied in four axioms for the well-formedness conditions for dependency structures. It is shown that allowing multiple governors as done by some followers of this formalism is unnecessary. The practice of augmenting Dependency Grammar with functional labels is also discussed in the light of building functional structures when the sentence is parsed. This will also facilitate semantic interpretation.