Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ekaterina Sutter

A Variational U-Net for Conditional Appearance and Shape Generation

Apr 12, 2018

Patrick Esser, Ekaterina Sutter, Björn Ommer

Figure 1 for A Variational U-Net for Conditional Appearance and Shape Generation

Figure 2 for A Variational U-Net for Conditional Appearance and Shape Generation

Figure 3 for A Variational U-Net for Conditional Appearance and Shape Generation

Figure 4 for A Variational U-Net for Conditional Appearance and Shape Generation

Abstract:Deep generative models have demonstrated great performance in image synthesis. However, results deteriorate in case of spatial deformations, since they generate images of objects directly, rather than modeling the intricate interplay of their inherent shape and appearance. We present a conditional U-Net for shape-guided image generation, conditioned on the output of a variational autoencoder for appearance. The approach is trained end-to-end on images, without requiring samples of the same object with varying pose or appearance. Experiments show that the model enables conditional image generation and transfer. Therefore, either shape or appearance can be retained from a query image, while freely altering the other. Moreover, appearance can be sampled due to its stochastic latent representation, while preserving shape. In quantitative and qualitative experiments on COCO, DeepFashion, shoes, Market-1501 and handbags, the approach demonstrates significant improvements over the state-of-the-art.

* CVPR 2018 (Spotlight). Project Page at https://compvis.github.io/vunet/

Via

Access Paper or Ask Questions

Unsupervised Video Understanding by Reconciliation of Posture Similarities

Aug 03, 2017

Timo Milbich, Miguel Bautista, Ekaterina Sutter, Bjorn Ommer

Figure 1 for Unsupervised Video Understanding by Reconciliation of Posture Similarities

Figure 2 for Unsupervised Video Understanding by Reconciliation of Posture Similarities

Figure 3 for Unsupervised Video Understanding by Reconciliation of Posture Similarities

Figure 4 for Unsupervised Video Understanding by Reconciliation of Posture Similarities

Abstract:Understanding human activity and being able to explain it in detail surpasses mere action classification by far in both complexity and value. The challenge is thus to describe an activity on the basis of its most fundamental constituents, the individual postures and their distinctive transitions. Supervised learning of such a fine-grained representation based on elementary poses is very tedious and does not scale. Therefore, we propose a completely unsupervised deep learning procedure based solely on video sequences, which starts from scratch without requiring pre-trained networks, predefined body models, or keypoints. A combinatorial sequence matching algorithm proposes relations between frames from subsets of the training data, while a CNN is reconciling the transitivity conflicts of the different subsets to learn a single concerted pose embedding despite changes in appearance across sequences. Without any manual annotation, the model learns a structured representation of postures and their temporal development. The model not only enables retrieval of similar postures but also temporal super-resolution. Additionally, based on a recurrent formulation, next frames can be synthesized.

* Accepted by ICCV 2017

Via

Access Paper or Ask Questions

CliqueCNN: Deep Unsupervised Exemplar Learning

Aug 31, 2016

Miguel A. Bautista, Artsiom Sanakoyeu, Ekaterina Sutter, Björn Ommer

Figure 1 for CliqueCNN: Deep Unsupervised Exemplar Learning

Figure 2 for CliqueCNN: Deep Unsupervised Exemplar Learning

Figure 3 for CliqueCNN: Deep Unsupervised Exemplar Learning

Figure 4 for CliqueCNN: Deep Unsupervised Exemplar Learning

Abstract:Exemplar learning is a powerful paradigm for discovering visual similarities in an unsupervised manner. In this context, however, the recent breakthrough in deep learning could not yet unfold its full potential. With only a single positive sample, a great imbalance between one positive and many negatives, and unreliable relationships between most samples, training of Convolutional Neural networks is impaired. Given weak estimates of local distance we propose a single optimization problem to extract batches of samples with mutually consistent relations. Conflicting relations are distributed over different batches and similar samples are grouped into compact cliques. Learning exemplar similarities is framed as a sequence of clique categorization tasks. The CNN then consolidates transitivity relations within and between cliques and learns a single representation for all samples without the need for labels. The proposed unsupervised approach has shown competitive performance on detailed posture analysis and object classification.

* Accepted for publication at NIPS 2016

Via

Access Paper or Ask Questions