Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ties van Rozendaal

Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference

Nov 27, 2024

Andrii Skliar, Ties van Rozendaal, Romain Lepert, Todor Boinovski, Mart van Baalen, Markus Nagel, Paul Whatmough, Babak Ehteshami Bejnordi

Figure 1 for Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference

Figure 2 for Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference

Figure 3 for Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference

Figure 4 for Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference

Abstract:Mixture of Experts (MoE) LLMs have recently gained attention for their ability to enhance performance by selectively engaging specialized subnetworks or "experts" for each input. However, deploying MoEs on memory-constrained devices remains challenging, particularly when generating tokens sequentially with a batch size of one, as opposed to typical high-throughput settings involving long sequences or large batches. In this work, we optimize MoE on memory-constrained devices where only a subset of expert weights fit in DRAM. We introduce a novel cache-aware routing strategy that leverages expert reuse during token generation to improve cache locality. We evaluate our approach on language modeling, MMLU, and GSM8K benchmarks and present on-device results demonstrating 2$\times$ speedups on mobile devices, offering a flexible, training-free solution to extend MoE's applicability across real-world applications.

Via

Access Paper or Ask Questions

MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device

Oct 02, 2023

Ties van Rozendaal, Tushar Singhal, Hoang Le, Guillaume Sautiere, Amir Said, Krishna Buska, Anjuman Raha, Dimitris Kalatzis, Hitarth Mehta, Frank Mayer(+3 more)

Figure 1 for MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device

Figure 2 for MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device

Figure 3 for MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device

Figure 4 for MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device

Abstract:Neural video codecs have recently become competitive with standard codecs such as HEVC in the low-delay setting. However, most neural codecs are large floating-point networks that use pixel-dense warping operations for temporal modeling, making them too computationally expensive for deployment on mobile devices. Recent work has demonstrated that running a neural decoder in real time on mobile is feasible, but shows this only for 720p RGB video, while the YUV420 format is more commonly used in production. This work presents the first neural video codec that decodes 1080p YUV420 video in real time on a mobile device. Our codec relies on two major contributions. First, we design an efficient codec that uses a block-based motion compensation algorithm available on the warping core of the mobile accelerator, and we show how to quantize this model to integer precision. Second, we implement a fast decoder pipeline that concurrently runs neural network components on the neural signal processor, parallel entropy coding on the mobile GPU, and warping on the warping core. Our codec outperforms the previous on-device codec by a large margin with up to 48 % BD-rate savings, while reducing the MAC count on the receiver side by 10x. We perform a careful ablation to demonstrate the effect of the introduced motion compensation scheme, and ablate the effect of model quantization.

Via

Access Paper or Ask Questions

Implicit Neural Video Compression

Dec 21, 2021

Yunfan Zhang, Ties van Rozendaal, Johann Brehmer, Markus Nagel, Taco Cohen

Figure 1 for Implicit Neural Video Compression

Figure 2 for Implicit Neural Video Compression

Figure 3 for Implicit Neural Video Compression

Figure 4 for Implicit Neural Video Compression

Abstract:We propose a method to compress full-resolution video sequences with implicit neural representations. Each frame is represented as a neural network that maps coordinate positions to pixel values. We use a separate implicit network to modulate the coordinate inputs, which enables efficient motion compensation between frames. Together with a small residual network, this allows us to efficiently compress P-frames relative to the previous frame. We further lower the bitrate by storing the network weights with learned integer quantization. Our method, which we call implicit pixel flow (IPF), offers several simplifications over established neural video codecs: it does not require the receiver to have access to a pretrained neural network, does not use expensive interpolation-based warping operations, and does not require a separate training dataset. We demonstrate the feasibility of neural implicit compression on image and video data.

Via

Access Paper or Ask Questions

Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set

Nov 19, 2021

Ties van Rozendaal, Johann Brehmer, Yunfan Zhang, Reza Pourreza, Taco S. Cohen

Figure 1 for Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set

Figure 2 for Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set

Figure 3 for Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set

Figure 4 for Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set

Abstract:We introduce a video compression algorithm based on instance-adaptive learning. On each video sequence to be transmitted, we finetune a pretrained compression model. The optimal parameters are transmitted to the receiver along with the latent code. By entropy-coding the parameter updates under a suitable mixture model prior, we ensure that the network parameters can be encoded efficiently. This instance-adaptive compression algorithm is agnostic about the choice of base model and has the potential to improve any neural video codec. On UVG, HEVC, and Xiph datasets, our codec improves the performance of a low-latency scale-space flow model by between 21% and 26% BD-rate savings, and that of a state-of-the-art B-frame model by 17 to 20% BD-rate savings. We also demonstrate that instance-adaptive finetuning improves the robustness to domain shift. Finally, our approach reduces the capacity requirements on compression models. We show that it enables a state-of-the-art performance even after reducing the network size by 72%.

Via

Access Paper or Ask Questions

Overfitting for Fun and Profit: Instance-Adaptive Data Compression

Jan 21, 2021

Ties van Rozendaal, Iris A. M. Huijben, Taco S. Cohen

Figure 1 for Overfitting for Fun and Profit: Instance-Adaptive Data Compression

Figure 2 for Overfitting for Fun and Profit: Instance-Adaptive Data Compression

Figure 3 for Overfitting for Fun and Profit: Instance-Adaptive Data Compression

Figure 4 for Overfitting for Fun and Profit: Instance-Adaptive Data Compression

Abstract:Neural data compression has been shown to outperform classical methods in terms of $RD$ performance, with results still improving rapidly. At a high level, neural compression is based on an autoencoder that tries to reconstruct the input instance from a (quantized) latent representation, coupled with a prior that is used to losslessly compress these latents. Due to limitations on model capacity and imperfect optimization and generalization, such models will suboptimally compress test data in general. However, one of the great strengths of learned compression is that if the test-time data distribution is known and relatively low-entropy (e.g. a camera watching a static scene, a dash cam in an autonomous car, etc.), the model can easily be finetuned or adapted to this distribution, leading to improved $RD$ performance. In this paper we take this concept to the extreme, adapting the full model to a single video, and sending model updates (quantized and compressed using a parameter-space prior) along with the latent representation. Unlike previous work, we finetune not only the encoder/latents but the entire model, and - during finetuning - take into account both the effect of model quantization and the additional costs incurred by sending the model updates. We evaluate an image compression model on I-frames (sampled at 2 fps) from videos of the Xiph dataset, and demonstrate that full-model adaptation improves $RD$ performance by ~1 dB, with respect to encoder-only finetuning.

* Accepted at ICLR 20201

Via

Access Paper or Ask Questions

Lossy Compression with Distortion Constrained Optimization

May 08, 2020

Ties van Rozendaal, Guillaume Sautière, Taco S. Cohen

Figure 1 for Lossy Compression with Distortion Constrained Optimization

Figure 2 for Lossy Compression with Distortion Constrained Optimization

Figure 3 for Lossy Compression with Distortion Constrained Optimization

Figure 4 for Lossy Compression with Distortion Constrained Optimization

Abstract:When training end-to-end learned models for lossy compression, one has to balance the rate and distortion losses. This is typically done by manually setting a tradeoff parameter $\beta$, an approach called $\beta$-VAE. Using this approach it is difficult to target a specific rate or distortion value, because the result can be very sensitive to $\beta$, and the appropriate value for $\beta$ depends on the model and problem setup. As a result, model comparison requires extensive per-model $\beta$-tuning, and producing a whole rate-distortion curve (by varying $\beta$) for each model to be compared. We argue that the constrained optimization method of Rezende and Viola, 2018 is a lot more appropriate for training lossy compression models because it allows us to obtain the best possible rate subject to a distortion constraint. This enables pointwise model comparisons, by training two models with the same distortion target and comparing their rate. We show that the method does manage to satisfy the constraint on a realistic image compression task, outperforms a constrained optimization method based on a hinge-loss, and is more practical to use for model selection than a $\beta$-VAE.

* Accepted as a CVPR 2020 workshop paper: Workshop and Challenge on Learned Image Compression (CLIC)

Via

Access Paper or Ask Questions

Video Compression With Rate-Distortion Autoencoders

Aug 14, 2019

Amirhossein Habibian, Ties van Rozendaal, Jakub M. Tomczak, Taco S. Cohen

Figure 1 for Video Compression With Rate-Distortion Autoencoders

Figure 2 for Video Compression With Rate-Distortion Autoencoders

Figure 3 for Video Compression With Rate-Distortion Autoencoders

Figure 4 for Video Compression With Rate-Distortion Autoencoders

Abstract:In this paper we present a a deep generative model for lossy video compression. We employ a model that consists of a 3D autoencoder with a discrete latent space and an autoregressive prior used for entropy coding. Both autoencoder and prior are trained jointly to minimize a rate-distortion loss, which is closely related to the ELBO used in variational autoencoders. Despite its simplicity, we find that our method outperforms the state-of-the-art learned video compression networks based on motion compensation or interpolation. We systematically evaluate various design choices, such as the use of frame-based or spatio-temporal autoencoders, and the type of autoregressive prior. In addition, we present three extensions of the basic method that demonstrate the benefits over classical approaches to compression. First, we introduce semantic compression, where the model is trained to allocate more bits to objects of interest. Second, we study adaptive compression, where the model is adapted to a domain with limited variability, e.g., videos taken from an autonomous car, to achieve superior compression on that domain. Finally, we introduce multimodal compression, where we demonstrate the effectiveness of our model in joint compression of multiple modalities captured by non-standard imaging sensors, such as quad cameras. We believe that this opens up novel video compression applications, which have not been feasible with classical codecs.

* Accepted to ICCV 2019

Via

Access Paper or Ask Questions