Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chulwoo Park

Constituency Structure over Eojeol in Korean Treebanks

Dec 27, 2025

Jungyeul Park, Chulwoo Park

Abstract:The design of Korean constituency treebanks raises a fundamental representational question concerning the choice of terminal units. Although Korean words are morphologically complex, treating morphemes as constituency terminals conflates word internal morphology with phrase level syntactic structure and creates mismatches with eojeol based dependency resources. This paper argues for an eojeol based constituency representation, with morphological segmentation and fine grained part of speech information encoded in a separate, non constituent layer. A comparative analysis shows that, under explicit normalization assumptions, the Sejong and Penn Korean treebanks can be treated as representationally equivalent at the eojeol based constituency level. Building on this result, we outline an eojeol based annotation scheme that preserves interpretable constituency and supports cross treebank comparison and constituency dependency conversion.

Via

Access Paper or Ask Questions

Enhancing Korean Dependency Parsing with Morphosyntactic Features

Mar 26, 2025

Jungyeul Park, Yige Chen, Kyuwon Kim, KyungTae Lim, Chulwoo Park

Abstract:This paper introduces UniDive for Korean, an integrated framework that bridges Universal Dependencies (UD) and Universal Morphology (UniMorph) to enhance the representation and processing of Korean {morphosyntax}. Korean's rich inflectional morphology and flexible word order pose challenges for existing frameworks, which often treat morphology and syntax separately, leading to inconsistencies in linguistic analysis. UniDive unifies syntactic and morphological annotations by preserving syntactic dependencies while incorporating UniMorph-derived features, improving consistency in annotation. We construct an integrated dataset and apply it to dependency parsing, demonstrating that enriched morphosyntactic features enhance parsing accuracy, particularly in distinguishing grammatical relations influenced by morphology. Our experiments, conducted with both encoder-only and decoder-only models, confirm that explicit morphological information contributes to more accurate syntactic analysis.

Via

Access Paper or Ask Questions

K-UD: Revising Korean Universal Dependencies Guidelines

Dec 01, 2024

Kyuwon Kim, Yige Chen, Eunkyul Leah Jo, KyungTae Lim, Jungyeul Park, Chulwoo Park

Figure 1 for K-UD: Revising Korean Universal Dependencies Guidelines

Figure 2 for K-UD: Revising Korean Universal Dependencies Guidelines

Abstract:Critique has surfaced concerning the existing linguistic annotation framework for Korean Universal Dependencies (UDs), particularly in relation to syntactic relationships. In this paper, our primary objective is to refine the definition of syntactic dependency of UDs within the context of analyzing the Korean language. Our aim is not only to achieve a consensus within UDs but also to garner agreement beyond the UD framework for analyzing Korean sentences using dependency structure, by establishing a linguistic consensus model.

Via

Access Paper or Ask Questions

Unlocking Korean Verbs: A User-Friendly Exploration into the Verb Lexicon

Oct 01, 2024

Seohyun Song, Eunkyul Leah Jo, Yige Chen, Jeen-Pyo Hong, Kyuwon Kim, Jin Wee, Miyoung Kang, KyungTae Lim, Jungyeul Park, Chulwoo Park

Figure 1 for Unlocking Korean Verbs: A User-Friendly Exploration into the Verb Lexicon

Figure 2 for Unlocking Korean Verbs: A User-Friendly Exploration into the Verb Lexicon

Figure 3 for Unlocking Korean Verbs: A User-Friendly Exploration into the Verb Lexicon

Figure 4 for Unlocking Korean Verbs: A User-Friendly Exploration into the Verb Lexicon

Abstract:The Sejong dictionary dataset offers a valuable resource, providing extensive coverage of morphology, syntax, and semantic representation. This dataset can be utilized to explore linguistic information in greater depth. The labeled linguistic structures within this dataset form the basis for uncovering relationships between words and phrases and their associations with target verbs. This paper introduces a user-friendly web interface designed for the collection and consolidation of verb-related information, with a particular focus on subcategorization frames. Additionally, it outlines our efforts in mapping this information by aligning subcategorization frames with corresponding illustrative sentence examples. Furthermore, we provide a Python library that would simplify syntactic parsing and semantic role labeling. These tools are intended to assist individuals interested in harnessing the Sejong dictionary dataset to develop applications for Korean language processing.

* COLING2025 System Demonstrations (Submitted)

Via

Access Paper or Ask Questions

K-UniMorph: Korean Universal Morphology and its Feature Schema

May 17, 2023

Eunkyul Leah Jo, Kyuwon Kim, Xihan Wu, KyungTae Lim, Jungyeul Park, Chulwoo Park

Abstract:We present in this work a new Universal Morphology dataset for Korean. Previously, the Korean language has been underrepresented in the field of morphological paradigms amongst hundreds of diverse world languages. Hence, we propose this Universal Morphological paradigms for the Korean language that preserve its distinct characteristics. For our K-UniMorph dataset, we outline each grammatical criterion in detail for the verbal endings, clarify how to extract inflected forms, and demonstrate how we generate the morphological schemata. This dataset adopts morphological feature schema from Sylak-Glassman et al. (2015) and Sylak-Glassman (2016) for the Korean language as we extract inflected verb forms from the Sejong morphologically analyzed corpus that is one of the largest annotated corpora for Korean. During the data creation, our methodology also includes investigating the correctness of the conversion from the Sejong corpus. Furthermore, we carry out the inflection task using three different Korean word forms: letters, syllables and morphemes. Finally, we discuss and describe future perspectives on Korean morphological paradigms and the dataset.

* Findings of the Association for Computational Linguistics: ACL 2023 (Camera-ready)

Via

Access Paper or Ask Questions