Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Denis Korzhenkov

Mobile Video Diffusion

Dec 10, 2024

Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas, Amir Ghodrati, Amirhossein Habibian

Abstract:Video diffusion models have achieved impressive realism and controllability but are limited by high computational demands, restricting their use on mobile devices. This paper introduces the first mobile-optimized video diffusion model. Starting from a spatio-temporal UNet from Stable Video Diffusion (SVD), we reduce memory and computational cost by reducing the frame resolution, incorporating multi-scale temporal representations, and introducing two novel pruning schema to reduce the number of channels and temporal blocks. Furthermore, we employ adversarial finetuning to reduce the denoising to a single step. Our model, coined as MobileVD, is 523x more efficient (1817.2 vs. 4.34 TFLOPs) with a slight quality drop (FVD 149 vs. 171), generating latents for a 14x512x256 px clip in 1.7 seconds on a Xiaomi-14 Pro. Our results are available at https://qualcomm-ai-research.github.io/mobile-video-diffusion/

Via

Access Paper or Ask Questions

On Sampling Strategies for Spectral Model Sharding

Oct 31, 2024

Denis Korzhenkov, Christos Louizos

Abstract:The problem of heterogeneous clients in federated learning has recently drawn a lot of attention. Spectral model sharding, i.e., partitioning the model parameters into low-rank matrices based on the singular value decomposition, has been one of the proposed solutions for more efficient on-device training in such settings. In this work, we present two sampling strategies for such sharding, obtained as solutions to specific optimization problems. The first produces unbiased estimators of the original weights, while the second aims to minimize the squared approximation error. We discuss how both of these estimators can be incorporated in the federated learning loop and practical considerations that arise during local training. Empirically, we demonstrate that both of these methods can lead to improved performance on various commonly used datasets.

* Accepted to NeurIPS 2024

Via

Access Paper or Ask Questions

A Mutual Information Perspective on Federated Contrastive Learning

May 03, 2024

Christos Louizos, Matthias Reisser, Denis Korzhenkov

Abstract:We investigate contrastive learning in the federated setting through the lens of SimCLR and multi-view mutual information maximization. In doing so, we uncover a connection between contrastive representation learning and user verification; by adding a user verification loss to each client's local SimCLR loss we recover a lower bound to the global multi-view mutual information. To accommodate for the case of when some labelled data are available at the clients, we extend our SimCLR variant to the federated semi-supervised setting. We see that a supervised SimCLR objective can be obtained with two changes: a) the contrastive loss is computed between datapoints that share the same label and b) we require an additional auxiliary head that predicts the correct labels from either of the two views. Along with the proposed SimCLR extensions, we also study how different sources of non-i.i.d.-ness can impact the performance of federated unsupervised learning through global mutual information maximization; we find that a global objective is beneficial for some sources of non-i.i.d.-ness but can be detrimental for others. We empirically evaluate our proposed extensions in various tasks to validate our claims and furthermore demonstrate that our proposed modifications generalize to other pretraining methods.

* Published as a conference paper at ICLR 2024

Via

Access Paper or Ask Questions

Self-improving Multiplane-to-layer Images for Novel View Synthesis

Oct 04, 2022

Pavel Solovev, Taras Khakhulin, Denis Korzhenkov

Figure 1 for Self-improving Multiplane-to-layer Images for Novel View Synthesis

Figure 2 for Self-improving Multiplane-to-layer Images for Novel View Synthesis

Figure 3 for Self-improving Multiplane-to-layer Images for Novel View Synthesis

Figure 4 for Self-improving Multiplane-to-layer Images for Novel View Synthesis

Abstract:We present a new method for lightweight novel-view synthesis that generalizes to an arbitrary forward-facing scene. Recent approaches are computationally expensive, require per-scene optimization, or produce a memory-expensive representation. We start by representing the scene with a set of fronto-parallel semitransparent planes and afterward convert them to deformable layers in an end-to-end manner. Additionally, we employ a feed-forward refinement procedure that corrects the estimated representation by aggregating information from input views. Our method does not require fine-tuning when a new scene is processed and can handle an arbitrary number of views without restrictions. Experimental results show that our approach surpasses recent models in terms of common metrics and human evaluation, with the noticeable advantage in inference speed and compactness of the inferred layered geometry, see https://samsunglabs.github.io/MLI

* Accepted for WACV 2023

Via

Access Paper or Ask Questions

Stereo Magnification with Multi-Layer Images

Jan 13, 2022

Taras Khakhulin, Denis Korzhenkov, Pavel Solovev, Gleb Sterkin, Timotei Ardelean, Victor Lempitsky

Figure 1 for Stereo Magnification with Multi-Layer Images

Figure 2 for Stereo Magnification with Multi-Layer Images

Figure 3 for Stereo Magnification with Multi-Layer Images

Figure 4 for Stereo Magnification with Multi-Layer Images

Abstract:Representing scenes with multiple semi-transparent colored layers has been a popular and successful choice for real-time novel view synthesis. Existing approaches infer colors and transparency values over regularly-spaced layers of planar or spherical shape. In this work, we introduce a new view synthesis approach based on multiple semi-transparent layers with scene-adapted geometry. Our approach infers such representations from stereo pairs in two stages. The first stage infers the geometry of a small number of data-adaptive layers from a given pair of views. The second stage infers the color and the transparency values for these layers producing the final representation for novel view synthesis. Importantly, both stages are connected through a differentiable renderer and are trained in an end-to-end manner. In the experiments, we demonstrate the advantage of the proposed approach over the use of regularly-spaced layers with no adaptation to scene geometry. Despite being orders of magnitude faster during rendering, our approach also outperforms a recently proposed IBRNet system based on implicit geometry representation. See results at https://samsunglabs.github.io/StereoLayers .

Via

Access Paper or Ask Questions

Image Generators with Conditionally-Independent Pixel Synthesis

Nov 27, 2020

Ivan Anokhin, Kirill Demochkin, Taras Khakhulin, Gleb Sterkin, Victor Lempitsky, Denis Korzhenkov

Figure 1 for Image Generators with Conditionally-Independent Pixel Synthesis

Figure 2 for Image Generators with Conditionally-Independent Pixel Synthesis

Figure 3 for Image Generators with Conditionally-Independent Pixel Synthesis

Figure 4 for Image Generators with Conditionally-Independent Pixel Synthesis

Abstract:Existing image generator networks rely heavily on spatial convolutions and, optionally, self-attention blocks in order to gradually synthesize images in a coarse-to-fine manner. Here, we present a new architecture for image generators, where the color value at each pixel is computed independently given the value of a random latent vector and the coordinate of that pixel. No spatial convolutions or similar operations that propagate information across pixels are involved during the synthesis. We analyze the modeling capabilities of such generators when trained in an adversarial fashion, and observe the new generators to achieve similar generation quality to state-of-the-art convolutional generators. We also investigate several interesting properties unique to the new architecture.

Via

Access Paper or Ask Questions

High-Resolution Daytime Translation Without Domain Labels

Mar 23, 2020

Ivan Anokhin, Pavel Solovev, Denis Korzhenkov, Alexey Kharlamov, Taras Khakhulin, Alexey Silvestrov, Sergey Nikolenko, Victor Lempitsky, Gleb Sterkin

Figure 1 for High-Resolution Daytime Translation Without Domain Labels

Figure 2 for High-Resolution Daytime Translation Without Domain Labels

Figure 3 for High-Resolution Daytime Translation Without Domain Labels

Figure 4 for High-Resolution Daytime Translation Without Domain Labels

Abstract:Modeling daytime changes in high resolution photographs, e.g., re-rendering the same scene under different illuminations typical for day, night, or dawn, is a challenging image manipulation task. We present the high-resolution daytime translation (HiDT) model for this task. HiDT combines a generative image-to-image model and a new upsampling scheme that allows to apply image translation at high resolution. The model demonstrates competitive results in terms of both commonly used GAN metrics and human evaluation. Importantly, this good performance comes as a result of training on a dataset of still landscape images with no daytime labels available. Our results are available at https://saic-mdal.github.io/HiDT/.

* accepted to CVPR 2020

Via

Access Paper or Ask Questions

YASENN: Explaining Neural Networks via Partitioning Activation Sequences

Nov 07, 2018

Yaroslav Zharov, Denis Korzhenkov, Pavel Shvechikov, Alexander Tuzhilin

Figure 1 for YASENN: Explaining Neural Networks via Partitioning Activation Sequences

Figure 2 for YASENN: Explaining Neural Networks via Partitioning Activation Sequences

Figure 3 for YASENN: Explaining Neural Networks via Partitioning Activation Sequences

Figure 4 for YASENN: Explaining Neural Networks via Partitioning Activation Sequences

Abstract:We introduce a novel approach to feed-forward neural network interpretation based on partitioning the space of sequences of neuron activations. In line with this approach, we propose a model-specific interpretation method, called YASENN. Our method inherits many advantages of model-agnostic distillation, such as an ability to focus on the particular input region and to express an explanation in terms of features different from those observed by a neural network. Moreover, examination of distillation error makes the method applicable to the problems with low tolerance to interpretation mistakes. Technically, YASENN distills the network with an ensemble of layer-wise gradient boosting decision trees and encodes the sequences of neuron activations with leaf indices. The finite number of unique codes induces a partitioning of the input space. Each partition may be described in a variety of ways, including examination of an interpretable model (e.g. a logistic regression or a decision tree) trained to discriminate between objects of those partitions. Our experiments provide an intuition behind the method and demonstrate revealed artifacts in neural network decision making.

Via

Access Paper or Ask Questions