Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rafael Redondo

FEP

Listen and Move: Improving GANs Coherency in Agnostic Sound-to-Video Generation

Jun 23, 2024

Rafael Redondo

Figure 1 for Listen and Move: Improving GANs Coherency in Agnostic Sound-to-Video Generation

Figure 2 for Listen and Move: Improving GANs Coherency in Agnostic Sound-to-Video Generation

Figure 3 for Listen and Move: Improving GANs Coherency in Agnostic Sound-to-Video Generation

Figure 4 for Listen and Move: Improving GANs Coherency in Agnostic Sound-to-Video Generation

Abstract:Deep generative models have demonstrated the ability to create realistic audiovisual content, sometimes driven by domains of different nature. However, smooth temporal dynamics in video generation is a challenging problem. This work focuses on generic sound-to-video generation and proposes three main features to enhance both image quality and temporal coherency in generative adversarial models: a triple sound routing scheme, a multi-scale residual and dilated recurrent network for extended sound analysis, and a novel recurrent and directional convolutional layer for video prediction. Each of the proposed features improves, in both quality and coherency, the baseline neural architecture typically used in the SoTA, with the video prediction layer providing an extra temporal refinement.

* Abstract version published in the ICCV 2023 workshop "AV4D: Visual Learning of Sounds in Spaces"
* Full paper of the homonym paper published in the ICCV 2023 workshop "AV4D: Visual Learning of Sounds in Spaces"

Via

Access Paper or Ask Questions

Extended Labeled Faces in-the-Wild : Augmenting Classes for Face Segmentation

Jun 24, 2020

Rafael Redondo, Jaume Gibert

Figure 1 for Extended Labeled Faces in-the-Wild : Augmenting Classes for Face Segmentation

Figure 2 for Extended Labeled Faces in-the-Wild : Augmenting Classes for Face Segmentation

Figure 3 for Extended Labeled Faces in-the-Wild : Augmenting Classes for Face Segmentation

Figure 4 for Extended Labeled Faces in-the-Wild : Augmenting Classes for Face Segmentation

Abstract:Existing face datasets often lack sufficient representation of occluding objects, which can hinder recognition, but also supply meaningful information to understand the visual context. In this work, we introduce Extended Labeled Faces in-the-Wild (ELFW), a dataset supplementing with additional face-related categories -- and also additional faces -- the originally released semantic labels in the vastly used Labeled Faces in-the-Wild (LFW) dataset. Additionally, two object-based data augmentation techniques are deployed to synthetically enrich under-represented categories which, in benchmarking experiments, reveal that not only segmenting the augmented categories improves, but also the remaining ones benefit.

* 14 pages, 12 figures

Via

Access Paper or Ask Questions

AI in the media and creative industries

May 10, 2019

Giuseppe Amato, Malte Behrmann, Frédéric Bimbot, Baptiste Caramiaux, Fabrizio Falchi, Ander Garcia, Joost Geurts, Jaume Gibert, Guillaume Gravier, Hadmut Holken(+9 more)

Abstract:Thanks to the Big Data revolution and increasing computing capacities, Artificial Intelligence (AI) has made an impressive revival over the past few years and is now omnipresent in both research and industry. The creative sectors have always been early adopters of AI technologies and this continues to be the case. As a matter of fact, recent technological developments keep pushing the boundaries of intelligent systems in creative applications: the critically acclaimed movie "Sunspring", released in 2016, was entirely written by AI technology, and the first-ever Music Album, called "Hello World", produced using AI has been released this year. Simultaneously, the exploratory nature of the creative process is raising important technical challenges for AI such as the ability for AI-powered techniques to be accurate under limited data resources, as opposed to the conventional "Big Data" approach, or the ability to process, analyse and match data from multiple modalities (text, sound, images, etc.) at the same time. The purpose of this white paper is to understand future technological advances in AI and their growing impact on creative industries. This paper addresses the following questions: Where does AI operate in creative Industries? What is its operative role? How will AI transform creative industries in the next ten years? This white paper aims to provide a realistic perspective of the scope of AI actions in creative industries, proposes a vision of how this technology could contribute to research and development works in such context, and identifies research and development challenges.

Via

Access Paper or Ask Questions