Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sejoon Kim

Context-Aware LLM Translation System Using Conversation Summarization and Dialogue History

Oct 22, 2024

Mingi Sung, Seungmin Lee, Jiwon Kim, Sejoon Kim

Figure 1 for Context-Aware LLM Translation System Using Conversation Summarization and Dialogue History

Figure 2 for Context-Aware LLM Translation System Using Conversation Summarization and Dialogue History

Figure 3 for Context-Aware LLM Translation System Using Conversation Summarization and Dialogue History

Figure 4 for Context-Aware LLM Translation System Using Conversation Summarization and Dialogue History

Abstract:Translating conversational text, particularly in customer support contexts, presents unique challenges due to its informal and unstructured nature. We propose a context-aware LLM translation system that leverages conversation summarization and dialogue history to enhance translation quality for the English-Korean language pair. Our approach incorporates the two most recent dialogues as raw data and a summary of earlier conversations to manage context length effectively. We demonstrate that this method significantly improves translation accuracy, maintaining coherence and consistency across conversations. This system offers a practical solution for customer support translation tasks, addressing the complexities of conversational text.

* Accepted to WMT 2024

Via

Access Paper or Ask Questions

Efficient Terminology Integration for LLM-based Translation in Specialized Domains

Oct 21, 2024

Sejoon Kim, Mingi Sung, Jeonghwan Lee, Hyunkuk Lim, Jorge Froilan Gimenez Perez

Abstract:Traditional machine translation methods typically involve training models directly on large parallel corpora, with limited emphasis on specialized terminology. However, In specialized fields such as patent, finance, or biomedical domains, terminology is crucial for translation, with many terms that needs to be translated following agreed-upon conventions. In this paper we introduce a methodology that efficiently trains models with a smaller amount of data while preserving the accuracy of terminology translation. We achieve this through a systematic process of term extraction and glossary creation using the Trie Tree algorithm, followed by data reconstruction to teach the LLM how to integrate these specialized terms. This methodology enhances the model's ability to handle specialized terminology and ensures high-quality translations, particularly in fields where term consistency is crucial. Our approach has demonstrated exceptional performance, achieving the highest translation score among participants in the WMT patent task to date, showcasing its effectiveness and broad applicability in specialized translation domains where general methods often fall short.

* Accepted to WMT 2024

Via

Access Paper or Ask Questions