Abstract:The Complete Tang Poems (CTP) is the most important source to study Tang poems. We look into CTP with computational tools from specific linguistic perspectives, including distributional semantics and collocational analysis. From such quantitative viewpoints, we compare the usage of "wind" and "moon" in the poems of Li Bai and Du Fu. Colors in poems function like sounds in movies, and play a crucial role in the imageries of poems. Thus, words for colors are studied, and "white" is the main focus because it is the most frequent color in CTP. We also explore some cases of using colored words in antithesis pairs that were central for fostering the imageries of the poems. CTP also contains useful historical information, and we extract person names in CTP to study the social networks of the Tang poets. Such information can then be integrated with the China Biographical Database of Harvard University.
Abstract:We analyzed historical and literary documents in Chinese to gain insights into research issues, and overview our studies which utilized four different sources of text materials in this paper. We investigated the history of concepts and transliterated words in China with the Database for the Study of Modern China Thought and Literature, which contains historical documents about China between 1830 and 1930. We also attempted to disambiguate names that were shared by multiple government officers who served between 618 and 1912 and were recorded in Chinese local gazetteers. To showcase the potentials and challenges of computer-assisted analysis of Chinese literatures, we explored some interesting yet non-trivial questions about two of the Four Great Classical Novels of China: (1) Which monsters attempted to consume the Buddhist monk Xuanzang in the Journey to the West (JTTW), which was published in the 16th century, (2) Which was the most powerful monster in JTTW, and (3) Which major role smiled the most in the Dream of the Red Chamber, which was published in the 18th century. Similar approaches can be applied to the analysis and study of modern documents, such as the newspaper articles published about the 228 incident that occurred in 1947 in Taiwan.
Abstract:We report applications of language technology to analyzing historical documents in the Database for the Study of Modern Chinese Thoughts and Literature (DSMCTL). We studied two historical issues with the reported techniques: the conceptualization of "huaren" (Chinese people) and the attempt to institute constitutional monarchy in the late Qing dynasty. We also discuss research challenges for supporting sophisticated issues using our experience with DSMCTL, the Database of Government Officials of the Republic of China, and the Dream of the Red Chamber. Advanced techniques and tools for lexical, syntactic, semantic, and pragmatic processing of language information, along with more thorough data collection, are needed to strengthen the collaboration between historians and computer scientists.