Abstract:This paper introduces EVOKE, a parallel dataset of emotion vocabulary in English and Korean. The dataset offers comprehensive coverage of emotion words in each language, in addition to many-to-many translations between words in the two languages and identification of language-specific emotion words. The dataset contains 1,427 Korean words and 1,399 English words, and we systematically annotate 819 Korean and 924 English adjectives and verbs. We also annotate multiple meanings of each word and their relationships, identifying polysemous emotion words and emotion-related metaphors. The dataset is, to our knowledge, the most comprehensive, systematic, and theory-agnostic dataset of emotion words in both Korean and English to date. It can serve as a practical tool for emotion science, psycholinguistics, computational linguistics, and natural language processing, allowing researchers to adopt different views on the resource reflecting their needs and theoretical perspectives. The dataset is publicly available at https://github.com/yoonwonj/EVOKE.
Abstract:This study investigates whether division on political topics is mapped with the distinctive patterns of language use. We collect a total 145,832 Reddit comments on the abortion debate and explore the languages of subreddit communities r/prolife and r/prochoice. With consideration of the Moral Foundations Theory, we examine lexical patterns in three ways. First, we compute proportional frequencies of lexical items from the Moral Foundations Dictionary in order to make inferences about each group's moral considerations when forming arguments for and against abortion. We then create n-gram models to reveal frequent collocations from each stance group and better understand how commonly used words are patterned in their linguistic context and in relation to morality values. Finally, we use Latent Dirichlet Allocation to identify underlying topical structures in the corpus data. Results show that the use of morality words is mapped with the stances on abortion.
Abstract:Markedness in natural language is often associated with non-literal meanings in discourse. Differential Object Marking (DOM) in Korean is one instance of this phenomenon, where post-positional markers are selected based on both the semantic features of the noun phrases and the discourse features that are orthogonal to the semantic features. Previous work has shown that distributional models of language recover certain semantic features of words -- do these models capture implied discourse-level meanings as well? We evaluate whether a set of large language models are capable of associating discourse meanings with different object markings in Korean. Results suggest that discourse meanings of a grammatical marker can be more challenging to encode than that of a discourse marker.