Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Roman Suvorov

Resolution-robust Large Mask Inpainting with Fourier Convolutions

Sep 15, 2021

Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, Victor Lempitsky

Figure 1 for Resolution-robust Large Mask Inpainting with Fourier Convolutions

Figure 2 for Resolution-robust Large Mask Inpainting with Fourier Convolutions

Figure 3 for Resolution-robust Large Mask Inpainting with Fourier Convolutions

Figure 4 for Resolution-robust Large Mask Inpainting with Fourier Convolutions

Abstract:Modern image inpainting systems, despite the significant progress, often struggle with large missing areas, complex geometric structures, and high-resolution images. We find that one of the main reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function. To alleviate this issue, we propose a new method called large mask inpainting (LaMa). LaMa is based on i) a new inpainting network architecture that uses fast Fourier convolutions, which have the image-wide receptive field; ii) a high receptive field perceptual loss; and iii) large training masks, which unlocks the potential of the first two components. Our inpainting network improves the state-of-the-art across a range of datasets and achieves excellent performance even in challenging scenarios, e.g. completion of periodic structures. Our model generalizes surprisingly well to resolutions that are higher than those seen at train time, and achieves this at lower parameter&compute costs than the competitive baselines. The code is available at https://github.com/saic-mdal/lama.

Via

Access Paper or Ask Questions

Perceptual Gradient Networks

May 05, 2021

Dmitry Nikulin, Roman Suvorov, Aleksei Ivakhnenko, Victor Lempitsky

Figure 1 for Perceptual Gradient Networks

Figure 2 for Perceptual Gradient Networks

Figure 3 for Perceptual Gradient Networks

Figure 4 for Perceptual Gradient Networks

Abstract:Many applications of deep learning for image generation use perceptual losses for either training or fine-tuning of the generator networks. The use of perceptual loss however incurs repeated forward-backward passes in a large image classification network as well as a considerable memory overhead required to store the activations of this network. It is therefore desirable or sometimes even critical to get rid of these overheads. In this work, we propose a way to train generator networks using approximations of perceptual loss that are computed without forward-backward passes. Instead, we use a simpler perceptual gradient network that directly synthesizes the gradient field of a perceptual loss. We introduce the concept of proxy targets, which stabilize the predicted gradient, meaning that learning with it does not lead to divergence or oscillations. In addition, our method allows interpretation of the predicted gradient, providing insight into the internals of perceptual loss and suggesting potential ways to improve it in future work.

* 28 pages, 15 figures, 8 tables

Via

Access Paper or Ask Questions

DeepLandscape: Adversarial Modeling of Landscape Video

Aug 21, 2020

Elizaveta Logacheva, Roman Suvorov, Oleg Khomenko, Anton Mashikhin, Victor Lempitsky

Figure 1 for DeepLandscape: Adversarial Modeling of Landscape Video

Figure 2 for DeepLandscape: Adversarial Modeling of Landscape Video

Figure 3 for DeepLandscape: Adversarial Modeling of Landscape Video

Figure 4 for DeepLandscape: Adversarial Modeling of Landscape Video

Abstract:We build a new model of landscape videos that can be trained on a mixture of static landscape images as well as landscape animations. Our architecture extends StyleGAN model by augmenting it with parts that allow to model dynamic changes in a scene. Once trained, our model can be used to generate realistic time-lapse landscape videos with moving objects and time-of-the-day changes. Furthermore, by fitting the learned models to a static landscape image, the latter can be reenacted in a realistic way. We propose simple but necessary modifications to StyleGAN inversion procedure, which lead to in-domain latent codes and allow to manipulate real images. Quantitative comparisons and user studies suggest that our model produces more compelling animations of given photographs than previously proposed methods. The results of our approach including comparisons with prior art can be seen in supplementary materials and on the project page https://saic-mdal.github.io/deep-landscape

* Accepted at ECCV 2020

Via

Access Paper or Ask Questions

Learning State Representations in Complex Systems with Multimodal Data

Nov 30, 2018

Pavel Solovev, Vladimir Aliev, Pavel Ostyakov, Gleb Sterkin, Elizaveta Logacheva, Stepan Troeshestov, Roman Suvorov, Anton Mashikhin, Oleg Khomenko, Sergey I. Nikolenko

Figure 1 for Learning State Representations in Complex Systems with Multimodal Data

Figure 2 for Learning State Representations in Complex Systems with Multimodal Data

Figure 3 for Learning State Representations in Complex Systems with Multimodal Data

Figure 4 for Learning State Representations in Complex Systems with Multimodal Data

Abstract:Representation learning becomes especially important for complex systems with multimodal data sources such as cameras or sensors. Recent advances in reinforcement learning and optimal control make it possible to design control algorithms on these latent representations, but the field still lacks a large-scale standard dataset for unified comparison. In this work, we present a large-scale dataset and evaluation framework for representation learning for the complex task of landing an airplane. We implement and compare several approaches to representation learning on this dataset in terms of the quality of simple supervised learning tasks and disentanglement scores. The resulting representations can be used for further tasks such as anomaly detection, optimal control, model-based reinforcement learning, and other applications.

* Fixed references

Via

Access Paper or Ask Questions

SEIGAN: Towards Compositional Image Generation by Simultaneously Learning to Segment, Enhance, and Inpaint

Nov 19, 2018

Pavel Ostyakov, Roman Suvorov, Elizaveta Logacheva, Oleg Khomenko, Sergey I. Nikolenko

Figure 1 for SEIGAN: Towards Compositional Image Generation by Simultaneously Learning to Segment, Enhance, and Inpaint

Figure 2 for SEIGAN: Towards Compositional Image Generation by Simultaneously Learning to Segment, Enhance, and Inpaint

Figure 3 for SEIGAN: Towards Compositional Image Generation by Simultaneously Learning to Segment, Enhance, and Inpaint

Figure 4 for SEIGAN: Towards Compositional Image Generation by Simultaneously Learning to Segment, Enhance, and Inpaint

Abstract:We present a novel approach to image manipulation and understanding by simultaneously learning to segment object masks, paste objects to another background image, and remove them from original images. For this purpose, we develop a novel generative model for compositional image generation, SEIGAN (Segment-Enhance-Inpaint Generative Adversarial Network), which learns these three operations together in an adversarial architecture with additional cycle consistency losses. To train, SEIGAN needs only bounding box supervision and does not require pairing or ground truth masks. SEIGAN produces better generated images (evaluated by human assessors) than other approaches and produces high-quality segmentation masks, improving over other adversarially trained approaches and getting closer to the results of fully supervised training.

Via

Access Paper or Ask Questions

Label Denoising with Large Ensembles of Heterogeneous Neural Networks

Sep 12, 2018

Pavel Ostyakov, Elizaveta Logacheva, Roman Suvorov, Vladimir Aliev, Gleb Sterkin, Oleg Khomenko, Sergey I. Nikolenko

Figure 1 for Label Denoising with Large Ensembles of Heterogeneous Neural Networks

Figure 2 for Label Denoising with Large Ensembles of Heterogeneous Neural Networks

Figure 3 for Label Denoising with Large Ensembles of Heterogeneous Neural Networks

Figure 4 for Label Denoising with Large Ensembles of Heterogeneous Neural Networks

Abstract:Despite recent advances in computer vision based on various convolutional architectures, video understanding remains an important challenge. In this work, we present and discuss a top solution for the large-scale video classification (labeling) problem introduced as a Kaggle competition based on the YouTube-8M dataset. We show and compare different approaches to preprocessing, data augmentation, model architectures, and model combination. Our final model is based on a large ensemble of video- and frame-level models but fits into rather limiting hardware constraints. We apply an approach based on knowledge distillation to deal with noisy labels in the original dataset and the recently developed mixup technique to improve the basic models.

Via

Access Paper or Ask Questions

The Limitations of Cross-language Word Embeddings Evaluation

Jun 06, 2018

Amir Bakarov, Roman Suvorov, Ilya Sochenkov

Figure 1 for The Limitations of Cross-language Word Embeddings Evaluation

Figure 2 for The Limitations of Cross-language Word Embeddings Evaluation

Abstract:The aim of this work is to explore the possible limitations of existing methods of cross-language word embeddings evaluation, addressing the lack of correlation between intrinsic and extrinsic cross-language evaluation methods. To prove this hypothesis, we construct English-Russian datasets for extrinsic and intrinsic evaluation tasks and compare performances of 5 different cross-language models on them. The results say that the scores even on different intrinsic benchmarks do not correlate to each other. We can conclude that the use of human references as ground truth for cross-language word embeddings is not proper unless one does not understand how do native speakers process semantics in their cognition.

* In Proceedings of the 7th Joint Conference on Lexical and Computational Semantics (*SEM 2018)

Via

Access Paper or Ask Questions