Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lee Hyun

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

Dec 15, 2023

Lee Hyun, Kim Sung-Bin, Seungju Han, Youngjae Yu, Tae-Hyun Oh

Figure 1 for SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

Figure 2 for SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

Figure 3 for SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

Figure 4 for SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

Abstract:Despite the recent advances of the artificial intelligence, building social intelligence remains a challenge. Among social signals, laughter is one of the distinctive expressions that occurs during social interactions between humans. In this work, we tackle a new challenge for machines to understand the rationale behind laughter in video, Video Laugh Reasoning. We introduce this new task to explain why people laugh in a particular video and a dataset for this task. Our proposed dataset, SMILE, comprises video clips and language descriptions of why people laugh. We propose a baseline by leveraging the reasoning capacity of large language models (LLMs) with textual video representation. Experiments show that our baseline can generate plausible explanations for laughter. We further investigate the scalability of our baseline by probing other video understanding tasks and in-the-wild videos. We release our dataset, code, and model checkpoints on https://github.com/SMILE-data/SMILE.

* 18 pages, 14 figures

Via

Access Paper or Ask Questions

LaughTalk: Expressive 3D Talking Head Generation with Laughter

Nov 02, 2023

Kim Sung-Bin, Lee Hyun, Da Hye Hong, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh

Figure 1 for LaughTalk: Expressive 3D Talking Head Generation with Laughter

Figure 2 for LaughTalk: Expressive 3D Talking Head Generation with Laughter

Figure 3 for LaughTalk: Expressive 3D Talking Head Generation with Laughter

Figure 4 for LaughTalk: Expressive 3D Talking Head Generation with Laughter

Abstract:Laughter is a unique expression, essential to affirmative social interactions of humans. Although current 3D talking head generation methods produce convincing verbal articulations, they often fail to capture the vitality and subtleties of laughter and smiles despite their importance in social context. In this paper, we introduce a novel task to generate 3D talking heads capable of both articulate speech and authentic laughter. Our newly curated dataset comprises 2D laughing videos paired with pseudo-annotated and human-validated 3D FLAME parameters and vertices. Given our proposed dataset, we present a strong baseline with a two-stage training scheme: the model first learns to talk and then acquires the ability to express laughter. Extensive experiments demonstrate that our method performs favorably compared to existing approaches in both talking head generation and expressing laughter signals. We further explore potential applications on top of our proposed method for rigging realistic avatars.

* Accepted to WACV2024

Via

Access Paper or Ask Questions

A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization

Oct 06, 2023

Kim Youwang, Lee Hyun, Kim Sung-Bin, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh

Abstract:We propose NeuFace, a 3D face mesh pseudo annotation method on videos via neural re-parameterized optimization. Despite the huge progress in 3D face reconstruction methods, generating reliable 3D face labels for in-the-wild dynamic videos remains challenging. Using NeuFace optimization, we annotate the per-view/-frame accurate and consistent face meshes on large-scale face videos, called the NeuFace-dataset. We investigate how neural re-parameterization helps to reconstruct image-aligned facial details on 3D meshes via gradient analysis. By exploiting the naturalness and diversity of 3D faces in our dataset, we demonstrate the usefulness of our dataset for 3D face-related tasks: improving the reconstruction accuracy of an existing 3D face reconstruction model and learning 3D facial motion prior. Code and datasets will be available at https://neuface-dataset.github.io.

* 9 pages, 7 figures, and 3 tables for the main paper. 8 pages, 6 figures and 3 tables for the appendix

Via

Access Paper or Ask Questions

ComMU: Dataset for Combinatorial Music Generation

Nov 17, 2022

Lee Hyun, Taehyun Kim, Hyolim Kang, Minjoo Ki, Hyeonchan Hwang, Kwanho Park, Sharang Han, Seon Joo Kim

Figure 1 for ComMU: Dataset for Combinatorial Music Generation

Figure 2 for ComMU: Dataset for Combinatorial Music Generation

Figure 3 for ComMU: Dataset for Combinatorial Music Generation

Figure 4 for ComMU: Dataset for Combinatorial Music Generation

Abstract:Commercial adoption of automatic music composition requires the capability of generating diverse and high-quality music suitable for the desired context (e.g., music for romantic movies, action games, restaurants, etc.). In this paper, we introduce combinatorial music generation, a new task to create varying background music based on given conditions. Combinatorial music generation creates short samples of music with rich musical metadata, and combines them to produce a complete music. In addition, we introduce ComMU, the first symbolic music dataset consisting of short music samples and their corresponding 12 musical metadata for combinatorial music generation. Notable properties of ComMU are that (1) dataset is manually constructed by professional composers with an objective guideline that induces regularity, and (2) it has 12 musical metadata that embraces composers' intentions. Our results show that we can generate diverse high-quality music only with metadata, and that our unique metadata such as track-role and extended chord quality improves the capacity of the automatic composition. We highly recommend watching our video before reading the paper (https://pozalabs.github.io/ComMU).

* 19 pages, 12 figures

Via

Access Paper or Ask Questions