Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hai Jin Park

Developing a Pragmatic Benchmark for Assessing Korean Legal Language Understanding in Large Language Models

Oct 11, 2024

Yeeun Kim, Young Rok Choi, Eunkyung Choi, Jinhwan Choi, Hai Jin Park, Wonseok Hwang

Figure 1 for Developing a Pragmatic Benchmark for Assessing Korean Legal Language Understanding in Large Language Models

Figure 2 for Developing a Pragmatic Benchmark for Assessing Korean Legal Language Understanding in Large Language Models

Figure 3 for Developing a Pragmatic Benchmark for Assessing Korean Legal Language Understanding in Large Language Models

Figure 4 for Developing a Pragmatic Benchmark for Assessing Korean Legal Language Understanding in Large Language Models

Abstract:Large language models (LLMs) have demonstrated remarkable performance in the legal domain, with GPT-4 even passing the Uniform Bar Exam in the U.S. However their efficacy remains limited for non-standardized tasks and tasks in languages other than English. This underscores the need for careful evaluation of LLMs within each legal system before application. Here, we introduce KBL, a benchmark for assessing the Korean legal language understanding of LLMs, consisting of (1) 7 legal knowledge tasks (510 examples), (2) 4 legal reasoning tasks (288 examples), and (3) the Korean bar exam (4 domains, 53 tasks, 2,510 examples). First two datasets were developed in close collaboration with lawyers to evaluate LLMs in practical scenarios in a certified manner. Furthermore, considering legal practitioners' frequent use of extensive legal documents for research, we assess LLMs in both a closed book setting, where they rely solely on internal knowledge, and a retrieval-augmented generation (RAG) setting, using a corpus of Korean statutes and precedents. The results indicate substantial room and opportunities for improvement.

* EMNLP 2024 Findings

Via

Access Paper or Ask Questions

Data-efficient End-to-end Information Extraction for Statistical Legal Analysis

Nov 03, 2022

Wonseok Hwang, Saehee Eom, Hanuhl Lee, Hai Jin Park, Minjoon Seo

Figure 1 for Data-efficient End-to-end Information Extraction for Statistical Legal Analysis

Figure 2 for Data-efficient End-to-end Information Extraction for Statistical Legal Analysis

Figure 3 for Data-efficient End-to-end Information Extraction for Statistical Legal Analysis

Figure 4 for Data-efficient End-to-end Information Extraction for Statistical Legal Analysis

Abstract:Legal practitioners often face a vast amount of documents. Lawyers, for instance, search for appropriate precedents favorable to their clients, while the number of legal precedents is ever-growing. Although legal search engines can assist finding individual target documents and narrowing down the number of candidates, retrieved information is often presented as unstructured text and users have to examine each document thoroughly which could lead to information overloading. This also makes their statistical analysis challenging. Here, we present an end-to-end information extraction (IE) system for legal documents. By formulating IE as a generation task, our system can be easily applied to various tasks without domain-specific engineering effort. The experimental results of four IE tasks on Korean precedents shows that our IE system can achieve competent scores (-2.3 on average) compared to the rule-based baseline with as few as 50 training examples per task and higher score (+5.4 on average) with 200 examples. Finally, our statistical analysis on two case categories--drunk driving and fraud--with 35k precedents reveals the resulting structured information from our IE system faithfully reflects the macroscopic features of Korean legal system.

* NLLP workshop @ EMNLP 2022

Via

Access Paper or Ask Questions