Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiao-Yong Yan

Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings

Mar 29, 2015

Xiao-Yong Yan, Petter Minnhagen

Figure 1 for Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings

Figure 2 for Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings

Figure 3 for Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings

Figure 4 for Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings

Abstract:The word-frequency distribution of a text written by an author is well accounted for by a maximum entropy distribution, the RGF (random group formation)-prediction. The RGF-distribution is completely determined by the a priori values of the total number of words in the text (M), the number of distinct words (N) and the number of repetitions of the most common word (k_max). It is here shown that this maximum entropy prediction also describes a text written in Chinese characters. In particular it is shown that although the same Chinese text written in words and Chinese characters have quite differently shaped distributions, they are nevertheless both well predicted by their respective three a priori characteristic values. It is pointed out that this is analogous to the change in the shape of the distribution when translating a given text to another language. Another consequence of the RGF-prediction is that taking a part of a long text will change the input parameters (M, N, k_max) and consequently also the shape of the frequency distribution. This is explicitly confirmed for texts written in Chinese characters. Since the RGF-prediction has no system-specific information beyond the three a priori values (M, N, k_max), any specific language characteristic has to be sought in systematic deviations from the RGF-prediction and the measured frequencies. One such systematic deviation is identified and, through a statistical information theoretical argument and an extended RGF-model, it is proposed that this deviation is caused by multiple meanings of Chinese characters. The effect is stronger for Chinese characters than for Chinese words. The relation between Zipf's law, the Simon-model for texts and the present results are discussed.

* PLoS ONE 10(5): e0125592 (2015)
* 15 pages, 10 figures, 2 tables

Via

Access Paper or Ask Questions

Efficient learning strategy of Chinese characters based on network approach

Mar 07, 2013

Xiao-Yong Yan, Ying Fan, Zengru Di, Shlomo Havlin, Jinshan Wu

Figure 1 for Efficient learning strategy of Chinese characters based on network approach

Figure 2 for Efficient learning strategy of Chinese characters based on network approach

Figure 3 for Efficient learning strategy of Chinese characters based on network approach

Figure 4 for Efficient learning strategy of Chinese characters based on network approach

Abstract:Based on network analysis of hierarchical structural relations among Chinese characters, we develop an efficient learning strategy of Chinese characters. We regard a more efficient learning method if one learns the same number of useful Chinese characters in less effort or time. We construct a node-weighted network of Chinese characters, where character usage frequencies are used as node weights. Using this hierarchical node-weighted network, we propose a new learning method, the distributed node weight (DNW) strategy, which is based on a new measure of nodes' importance that takes into account both the weight of the nodes and the hierarchical structure of the network. Chinese character learning strategies, particularly their learning order, are analyzed as dynamical processes over the network. We compare the efficiency of three theoretical learning methods and two commonly used methods from mainstream Chinese textbooks, one for Chinese elementary school students and the other for students learning Chinese as a second language. We find that the DNW method significantly outperforms the others, implying that the efficiency of current learning methods of major textbooks can be greatly improved.

* PLoS ONE 8(8): e69745 (2013)
* 8 pages, 6 figures

Via

Access Paper or Ask Questions