Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guillaume Sautière

Neural Image Compression with a Diffusion-Based Decoder

Jan 23, 2023

Noor Fathima Ghouse, Jens Petersen, Auke Wiggers, Tianlin Xu, Guillaume Sautière

Figure 1 for Neural Image Compression with a Diffusion-Based Decoder

Figure 2 for Neural Image Compression with a Diffusion-Based Decoder

Figure 3 for Neural Image Compression with a Diffusion-Based Decoder

Figure 4 for Neural Image Compression with a Diffusion-Based Decoder

Abstract:Diffusion probabilistic models have recently achieved remarkable success in generating high quality image and video data. In this work, we build on this class of generative models and introduce a method for lossy compression of high resolution images. The resulting codec, which we call DIffuson-based Residual Augmentation Codec (DIRAC),is the first neural codec to allow smooth traversal of the rate-distortion-perception tradeoff at test time, while obtaining competitive performance with GAN-based methods in perceptual quality. Furthermore, while sampling from diffusion probabilistic models is notoriously expensive, we show that in the compression setting the number of steps can be drastically reduced.

* v1: 26 pages, 13 figures v2: corrected typo in first author name in arxiv metadata

Via

Access Paper or Ask Questions

Region-of-Interest Based Neural Video Compression

Mar 03, 2022

Yura Perugachi-Diaz, Guillaume Sautière, Davide Abati, Yang Yang, Amirhossein Habibian, Taco S Cohen

Figure 1 for Region-of-Interest Based Neural Video Compression

Figure 2 for Region-of-Interest Based Neural Video Compression

Figure 3 for Region-of-Interest Based Neural Video Compression

Figure 4 for Region-of-Interest Based Neural Video Compression

Abstract:Humans do not perceive all parts of a scene with the same resolution, but rather focus on few regions of interest (ROIs). Traditional Object-Based codecs take advantage of this biological intuition, and are capable of non-uniform allocation of bits in favor of salient regions, at the expense of increased distortion the remaining areas: such a strategy allows a boost in perceptual quality under low rate constraints. Recently, several neural codecs have been introduced for video compression, yet they operate uniformly over all spatial locations, lacking the capability of ROI-based processing. In this paper, we introduce two models for ROI-based neural video coding. First, we propose an implicit model that is fed with a binary ROI mask and it is trained by de-emphasizing the distortion of the background. Secondly, we design an explicit latent scaling method, that allows control over the quantization binwidth for different spatial regions of latent variables, conditioned on the ROI mask. By extensive experiments, we show that our methods outperform all our baselines in terms of Rate-Distortion (R-D) performance in the ROI. Moreover, they can generalize to different datasets and to any arbitrary ROI at inference time. Finally, they do not require expensive pixel-level annotations during training, as synthetic ROI masks can be used with little to no degradation in performance. To the best of our knowledge, our proposals are the first solutions that integrate ROI-based capabilities into neural video compression models.

Via

Access Paper or Ask Questions

Lossy Compression with Distortion Constrained Optimization

May 08, 2020

Ties van Rozendaal, Guillaume Sautière, Taco S. Cohen

Figure 1 for Lossy Compression with Distortion Constrained Optimization

Figure 2 for Lossy Compression with Distortion Constrained Optimization

Figure 3 for Lossy Compression with Distortion Constrained Optimization

Figure 4 for Lossy Compression with Distortion Constrained Optimization

Abstract:When training end-to-end learned models for lossy compression, one has to balance the rate and distortion losses. This is typically done by manually setting a tradeoff parameter $\beta$, an approach called $\beta$-VAE. Using this approach it is difficult to target a specific rate or distortion value, because the result can be very sensitive to $\beta$, and the appropriate value for $\beta$ depends on the model and problem setup. As a result, model comparison requires extensive per-model $\beta$-tuning, and producing a whole rate-distortion curve (by varying $\beta$) for each model to be compared. We argue that the constrained optimization method of Rezende and Viola, 2018 is a lot more appropriate for training lossy compression models because it allows us to obtain the best possible rate subject to a distortion constraint. This enables pointwise model comparisons, by training two models with the same distortion target and comparing their rate. We show that the method does manage to satisfy the constraint on a realistic image compression task, outperforms a constrained optimization method based on a hinge-loss, and is more practical to use for model selection than a $\beta$-VAE.

* Accepted as a CVPR 2020 workshop paper: Workshop and Challenge on Learned Image Compression (CLIC)

Via

Access Paper or Ask Questions

Feedback Recurrent AutoEncoder

Nov 11, 2019

Yang Yang, Guillaume Sautière, J. Jon Ryu, Taco S Cohen

Figure 1 for Feedback Recurrent AutoEncoder

Figure 2 for Feedback Recurrent AutoEncoder

Figure 3 for Feedback Recurrent AutoEncoder

Figure 4 for Feedback Recurrent AutoEncoder

Abstract:In this work, we propose a new recurrent autoencoder architecture, termed Feedback Recurrent AutoEncoder (FRAE), for online compression of sequential data with temporal dependency. The recurrent structure of FRAE is designed to efficiently extract the redundancy along the time dimension and allows a compact discrete representation of the data to be learned. We demonstrate its effectiveness in speech spectrogram compression. Specifically, we show that the FRAE, paired with a powerful neural vocoder, can produce high-quality speech waveforms at a low, fixed bitrate. We further show that by adding a learned prior for the latent space and using an entropy coder, we can achieve an even lower variable bitrate.

Via

Access Paper or Ask Questions