Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinsik Lee

KoBALT: Korean Benchmark For Advanced Linguistic Tasks

May 22, 2025

Hyopil Shin, Sangah Lee, Dongjun Jang, Wooseok Song, Jaeyoon Kim, Chaeyoung Oh, Hyemi Jo, Youngchae Ahn, Sihyun Oh, Hyohyeong Chang(+2 more)

Abstract:We introduce KoBALT (Korean Benchmark for Advanced Linguistic Tasks), a comprehensive linguistically-motivated benchmark comprising 700 multiple-choice questions spanning 24 phenomena across five linguistic domains: syntax, semantics, pragmatics, phonetics/phonology, and morphology. KoBALT is designed to advance the evaluation of large language models (LLMs) in Korean, a morphologically rich language, by addressing the limitations of conventional benchmarks that often lack linguistic depth and typological grounding. It introduces a suite of expert-curated, linguistically motivated questions with minimal n-gram overlap with standard Korean corpora, substantially mitigating the risk of data contamination and allowing a more robust assessment of true language understanding. Our evaluation of 20 contemporary LLMs reveals significant performance disparities, with the highest-performing model achieving 61\% general accuracy but showing substantial variation across linguistic domains - from stronger performance in semantics (66\%) to considerable weaknesses in phonology (31\%) and morphology (36\%). Through human preference evaluation with 95 annotators, we demonstrate a strong correlation between KoBALT scores and human judgments, validating our benchmark's effectiveness as a discriminative measure of Korean language understanding. KoBALT addresses critical gaps in linguistic evaluation for typologically diverse languages and provides a robust framework for assessing genuine linguistic competence in Korean language models.

* Under Reveiw

Via

Access Paper or Ask Questions

EXAONE Deep: Reasoning Enhanced Language Models

Mar 16, 2025

LG AI Research, Kyunghoon Bae, Eunbi Choi, Kibong Choi, Stanley Jungkyu Choi, Yemuk Choi, Seokhee Hong, Junwon Hwang, Hyojin Jeon, Kijeong Jeon(+22 more)

Abstract:We present EXAONE Deep series, which exhibits superior capabilities in various reasoning tasks, including math and coding benchmarks. We train our models mainly on the reasoning-specialized dataset that incorporates long streams of thought processes. Evaluation results show that our smaller models, EXAONE Deep 2.4B and 7.8B, outperform other models of comparable size, while the largest model, EXAONE Deep 32B, demonstrates competitive performance against leading open-weight models. All EXAONE Deep models are openly available for research purposes and can be downloaded from https://huggingface.co/LGAI-EXAONE

* arXiv admin note: substantial text overlap with arXiv:2412.04862, arXiv:2408.03541

Via

Access Paper or Ask Questions

EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

Dec 09, 2024

LG AI Research, Soyoung An, Kyunghoon Bae, Eunbi Choi, Kibong Choi, Stanley Jungkyu Choi, Seokhee Hong, Junwon Hwang, Hyojin Jeon, Gerrard Jeongwon Jo(+23 more)

Figure 1 for EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

Figure 2 for EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

Figure 3 for EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

Figure 4 for EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

Abstract:This technical report introduces the EXAONE 3.5 instruction-tuned language models, developed and released by LG AI Research. The EXAONE 3.5 language models are offered in three configurations: 32B, 7.8B, and 2.4B. These models feature several standout capabilities: 1) exceptional instruction following capabilities in real-world scenarios, achieving the highest scores across seven benchmarks, 2) outstanding long-context comprehension, attaining the top performance in four benchmarks, and 3) competitive results compared to state-of-the-art open models of similar sizes across nine general benchmarks. The EXAONE 3.5 language models are open to anyone for research purposes and can be downloaded from https://huggingface.co/LGAI-EXAONE. For commercial use, please reach out to the official contact point of LG AI Research: contact_us@lgresearch.ai.

* arXiv admin note: text overlap with arXiv:2408.03541

Via

Access Paper or Ask Questions

EXAONE 3.0 7.8B Instruction Tuned Language Model

Aug 07, 2024

LG AI Research, Soyoung An, Kyunghoon Bae, Eunbi Choi, Stanley Jungkyu Choi, Yemuk Choi, Seokhee Hong, Yeonjung Hong, Junwon Hwang, Hyojin Jeon(+28 more)

Figure 1 for EXAONE 3.0 7.8B Instruction Tuned Language Model

Figure 2 for EXAONE 3.0 7.8B Instruction Tuned Language Model

Figure 3 for EXAONE 3.0 7.8B Instruction Tuned Language Model

Figure 4 for EXAONE 3.0 7.8B Instruction Tuned Language Model

Abstract:We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly competitive real-world performance with instruction-following capability against other state-of-the-art open models of similar size. Our comparative analysis shows that EXAONE 3.0 excels particularly in Korean, while achieving compelling performance across general tasks and complex reasoning. With its strong real-world effectiveness and bilingual proficiency, we hope that EXAONE keeps contributing to advancements in Expert AI. Our EXAONE 3.0 instruction-tuned model is available at https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

Via

Access Paper or Ask Questions

Boosting Cross-lingual Transferability in Multilingual Models via In-Context Learning

May 24, 2023

Sunkyoung Kim, Dayeon Ki, Yireun Kim, Jinsik Lee

Abstract:Existing cross-lingual transfer (CLT) prompting methods are only concerned with monolingual demonstration examples in the source language. In this paper, we propose In-CLT, a novel cross-lingual transfer prompting method that leverages both source and target languages to construct the demonstration examples. We conduct comprehensive evaluations on multilingual benchmarks, focusing on question answering tasks. Experiment results show that In-CLT prompt not only improves multilingual models' cross-lingual transferability, but also demonstrates remarkable unseen language generalization ability. In-CLT prompting, in particular, improves model performance by 10 to 20\% points on average when compared to prior cross-lingual transfer approaches. We also observe the surprising performance gain on the other multilingual benchmarks, especially in reasoning tasks. Furthermore, we investigate the relationship between lexical similarity and pre-training corpora in terms of the cross-lingual transfer gap.

* Work In Progress

Via

Access Paper or Ask Questions

SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking

Jul 17, 2019

Hwaran Lee, Jinsik Lee, Tae-Yoon Kim

Figure 1 for SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking

Figure 2 for SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking

Figure 3 for SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking

Figure 4 for SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking

Abstract:In goal-oriented dialog systems, belief trackers estimate the probability distribution of slot-values at every dialog turn. Previous neural approaches have modeled domain- and slot-dependent belief trackers, and have difficulty in adding new slot-values, resulting in lack of flexibility of domain ontology configurations. In this paper, we propose a new approach to universal and scalable belief tracker, called slot-utterance matching belief tracker (SUMBT). The model learns the relations between domain-slot-types and slot-values appearing in utterances through attention mechanisms based on contextual semantic vectors. Furthermore, the model predicts slot-value labels in a non-parametric way. From our experiments on two dialog corpora, WOZ 2.0 and MultiWOZ, the proposed model showed performance improvement in comparison with slot-dependent methods and achieved the state-of-the-art joint accuracy.

* 6 pages, 2 figures, The 57th Annual Meeting of the Association for Computational Linguistics (ACL)

Via

Access Paper or Ask Questions