Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning

Apr 14, 2020

Kangning Liu, Shuhang Gu, Andres Romero, Radu Timofte

Figure 1 for Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning

Figure 2 for Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning

Figure 3 for Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning

Figure 4 for Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning

Share this with someone who'll enjoy it:

Abstract:Existing unsupervised video-to-video translation methods fail to produce translated videos which are frame-wise realistic, semantic information preserving and video-level consistent. In this work, we propose UVIT, a novel unsupervised video-to-video translation model. Our model decomposes the style and the content, uses the specialized encoder-decoder structure and propagates the inter-frame information through bidirectional recurrent neural network (RNN) units. The style-content decomposition mechanism enables us to achieve style consistent video translation results as well as provides us with a good interface for modality flexible translation. In addition, by changing the input frames and style codes incorporated in our translation, we propose a video interpolation loss, which captures temporal information within the sequence to train our building blocks in a self-supervised manner. Our model can produce photo-realistic, spatio-temporal consistent translated videos in a multimodal way. Subjective and objective experimental results validate the superiority of our model over existing methods. More details can be found on our project website: https://uvit.netlify.com

View paper on

Share this with someone who'll enjoy it:

Title:Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning

Paper and Code