Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Youngmin Kim

MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation

May 24, 2025

Woohyun Cho, Youngmin Kim, Sunghyun Lee, Youngjae Yu

Abstract:Lyrics translation requires both accurate semantic transfer and preservation of musical rhythm, syllabic structure, and poetic style. In animated musicals, the challenge intensifies due to alignment with visual and auditory cues. We introduce Multilingual Audio-Video Lyrics Benchmark for Animated Song Translation (MAVL), the first multilingual, multimodal benchmark for singable lyrics translation. By integrating text, audio, and video, MAVL enables richer and more expressive translations than text-only approaches. Building on this, we propose Syllable-Constrained Audio-Video LLM with Chain-of-Thought SylAVL-CoT, which leverages audio-video cues and enforces syllabic constraints to produce natural-sounding lyrics. Experimental results demonstrate that SylAVL-CoT significantly outperforms text-based models in singability and contextual accuracy, emphasizing the value of multimodal, multilingual approaches for lyrics translation.

* 28 pages, 8 figures

Via

Access Paper or Ask Questions

Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation

Jun 26, 2024

Youngmin Kim, Saejin Kim, Hoyeon Moon, Youngjae Yu, Junhyug Noh

Abstract:Scalp diseases and alopecia affect millions of people around the world, underscoring the urgent need for early diagnosis and management of the disease. However, the development of a comprehensive AI-based diagnosis system encompassing these conditions remains an underexplored domain due to the challenges associated with data imbalance and the costly nature of labeling. To address these issues, we propose ScalpVision, an AI-driven system for the holistic diagnosis of scalp diseases and alopecia. In ScalpVision, effective hair segmentation is achieved using pseudo image-label pairs and an innovative prompting method in the absence of traditional hair masking labels. This approach is crucial for extracting key features such as hair thickness and count, which are then used to assess alopecia severity. Additionally, ScalpVision introduces DiffuseIT-M, a generative model adept at dataset augmentation while maintaining hair information, facilitating improved predictions of scalp disease severity. Our experimental results affirm ScalpVision's efficiency in diagnosing a variety of scalp conditions and alopecia, showcasing its potential as a valuable tool in dermatological care.

* IEEE Transactions on Medical Imaging (Under Review)

Via

Access Paper or Ask Questions

Automatic Question-Answer Generation for Long-Tail Knowledge

Mar 03, 2024

Rohan Kumar, Youngmin Kim, Sunitha Ravi, Haitian Sun, Christos Faloutsos, Ruslan Salakhutdinov, Minji Yoon

Figure 1 for Automatic Question-Answer Generation for Long-Tail Knowledge

Figure 2 for Automatic Question-Answer Generation for Long-Tail Knowledge

Figure 3 for Automatic Question-Answer Generation for Long-Tail Knowledge

Figure 4 for Automatic Question-Answer Generation for Long-Tail Knowledge

Abstract:Pretrained Large Language Models (LLMs) have gained significant attention for addressing open-domain Question Answering (QA). While they exhibit high accuracy in answering questions related to common knowledge, LLMs encounter difficulties in learning about uncommon long-tail knowledge (tail entities). Since manually constructing QA datasets demands substantial human resources, the types of existing QA datasets are limited, leaving us with a scarcity of datasets to study the performance of LLMs on tail entities. In this paper, we propose an automatic approach to generate specialized QA datasets for tail entities and present the associated research challenges. We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets, comparing their performance with and without external resources including Wikipedia and Wikidata knowledge graphs.

* Accepted at KDD 2023 KnowledgeNLP

Via

Access Paper or Ask Questions

Keypoint based Sign Language Translation without Glosses

Apr 22, 2022

Youngmin Kim, Minji Kwak, Dain Lee, Yeongeun Kim, Hyeongboo Baek

Figure 1 for Keypoint based Sign Language Translation without Glosses

Figure 2 for Keypoint based Sign Language Translation without Glosses

Figure 3 for Keypoint based Sign Language Translation without Glosses

Figure 4 for Keypoint based Sign Language Translation without Glosses

Abstract:Sign Language Translation (SLT) is a task that has not been studied relatively much compared to the study of Sign Language Recognition (SLR). However, the SLR is a study that recognizes the unique grammar of sign language, which is different from the spoken language and has a problem that non-disabled people cannot easily interpret. So, we're going to solve the problem of translating directly spoken language in sign language video. To this end, we propose a new keypoint normalization method for performing translation based on the skeleton point of the signer and robustly normalizing these points in sign language translation. It contributed to performance improvement by a customized normalization method depending on the body parts. In addition, we propose a stochastic frame selection method that enables frame augmentation and sampling at the same time. Finally, it is translated into the spoken language through an Attention-based translation model. Our method can be applied to various datasets in a way that can be applied to datasets without glosses. In addition, quantitative experimental evaluation proved the excellence of our method.

* 14 pages, 5 figures, IEEE Sensors Journals

Via

Access Paper or Ask Questions

Are Evolutionary Algorithms Safe Optimizers?

Mar 24, 2022

Youngmin Kim, Richard Allmendinger, Manuel López-Ibáñez

Figure 1 for Are Evolutionary Algorithms Safe Optimizers?

Figure 2 for Are Evolutionary Algorithms Safe Optimizers?

Figure 3 for Are Evolutionary Algorithms Safe Optimizers?

Figure 4 for Are Evolutionary Algorithms Safe Optimizers?

Abstract:We consider a type of constrained optimization problem, where the violation of a constraint leads to an irrevocable loss, such as breakage of a valuable experimental resource/platform or loss of human life. Such problems are referred to as safe optimization problems (SafeOPs). While SafeOPs have received attention in the machine learning community in recent years, there was little interest in the evolutionary computation (EC) community despite some early attempts between 2009 and 2011. Moreover, there is a lack of acceptable guidelines on how to benchmark different algorithms for SafeOPs, an area where the EC community has significant experience in. Driven by the need for more efficient algorithms and benchmark guidelines for SafeOPs, the objective of this paper is to reignite the interest of this problem class in the EC community. To achieve this we (i) provide a formal definition of SafeOPs and contrast it to other types of optimization problems that the EC community is familiar with, (ii) investigate the impact of key SafeOP parameters on the performance of selected safe optimization algorithms, (iii) benchmark EC against state-of-the-art safe optimization algorithms from the machine learning community, and (iv) provide an open-source Python framework to replicate and extend our work.

* Accepted for GECCO 2022. 8 pages, excluding references, accompanied by supplementary material (4 pages, excluding references). 6 figures (and 6 figures in supplementary material also)

Via

Access Paper or Ask Questions

Safe Learning and Optimization Techniques: Towards a Survey of the State of the Art

Feb 18, 2021

Youngmin Kim, Richard Allmendinger, Manuel López-Ibáñez

Figure 1 for Safe Learning and Optimization Techniques: Towards a Survey of the State of the Art

Figure 2 for Safe Learning and Optimization Techniques: Towards a Survey of the State of the Art

Abstract:Safe learning and optimization deals with learning and optimization problems that avoid, as much as possible, the evaluation of non-safe input points, which are solutions, policies, or strategies that cause an irrecoverable loss (e.g., breakage of a machine or equipment, or life threat). Although a comprehensive survey of safe reinforcement learning algorithms was published in 2015, a number of new algorithms have been proposed thereafter, and related works in active learning and in optimization were not considered. This paper reviews those algorithms from a number of domains including reinforcement learning, Gaussian process regression and classification, evolutionary algorithms, and active learning. We provide the fundamental concepts on which the reviewed algorithms are based and a characterization of the individual algorithms. We conclude by explaining how the algorithms are connected and suggestions for future research.

* 16 pages. The paper was presented at the 1st TAILOR workshop, ECAI 2020

Via

Access Paper or Ask Questions

MMGAN: Manifold Matching Generative Adversarial Network

Apr 12, 2018

Noseong Park, Ankesh Anand, Joel Ruben Antony Moniz, Kookjin Lee, Tanmoy Chakraborty, Jaegul Choo, Hongkyu Park, Youngmin Kim

Figure 1 for MMGAN: Manifold Matching Generative Adversarial Network

Figure 2 for MMGAN: Manifold Matching Generative Adversarial Network

Figure 3 for MMGAN: Manifold Matching Generative Adversarial Network

Figure 4 for MMGAN: Manifold Matching Generative Adversarial Network

Abstract:It is well-known that GANs are difficult to train, and several different techniques have been proposed in order to stabilize their training. In this paper, we propose a novel training method called manifold-matching, and a new GAN model called manifold-matching GAN (MMGAN). MMGAN finds two manifolds representing the vector representations of real and fake images. If these two manifolds match, it means that real and fake images are statistically identical. To assist the manifold-matching task, we also use i) kernel tricks to find better manifold structures, ii) moving-averaged manifolds across mini-batches, and iii) a regularizer based on correlation matrix to suppress mode collapse. We conduct in-depth experiments with three image datasets and compare with several state-of-the-art GAN models. 32.4% of images generated by the proposed MMGAN are recognized as fake images during our user study (16% enhancement compared to other state-of-the-art model). MMGAN achieved an unsupervised inception score of 7.8 for CIFAR-10.

* the 24th International Conference on Pattern Recognition (ICPR), 2018

Via

Access Paper or Ask Questions