Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seungeon Kim

Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity

Mar 05, 2024

Hagyeong Lee, Minkyu Kim, Jun-Hyuk Kim, Seungeon Kim, Dokwan Oh, Jaeho Lee

Abstract:Recent advances in text-guided image compression have shown great potential to enhance the perceptual quality of reconstructed images. These methods, however, tend to have significantly degraded pixel-wise fidelity, limiting their practicality. To fill this gap, we develop a new text-guided image compression algorithm that achieves both high perceptual and pixel-wise fidelity. In particular, we propose a compression framework that leverages text information mainly by text-adaptive encoding and training with joint image-text loss. By doing so, we avoid decoding based on text-guided generative models -- known for high generative diversity -- and effectively utilize the semantic information of text at a global level. Experimental results on various datasets show that our method can achieve high pixel-level and perceptual quality, with either human- or machine-generated captions. In particular, our method outperforms all baselines in terms of LPIPS, with some room for even more improvements when we use more carefully generated captions.

* The first two authors contributed equally

Via

Access Paper or Ask Questions

Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding

Sep 05, 2023

Yeongwoong Kim, Suyong Bahk, Seungeon Kim, Won Hee Lee, Dokwan Oh, Hui Yong Kim

Figure 1 for Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding

Figure 2 for Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding

Figure 3 for Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding

Figure 4 for Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding

Abstract:Neural video compression (NVC) is a rapidly evolving video coding research area, with some models achieving superior coding efficiency compared to the latest video coding standard Versatile Video Coding (VVC). In conventional video coding standards, the hierarchical B-frame coding, which utilizes a bidirectional prediction structure for higher compression, had been well-studied and exploited. In NVC, however, limited research has investigated the hierarchical B scheme. In this paper, we propose an NVC model exploiting hierarchical B-frame coding with temporal layer-adaptive optimization. We first extend an existing unidirectional NVC model to a bidirectional model, which achieves -21.13% BD-rate gain over the unidirectional baseline model. However, this model faces challenges when applied to sequences with complex or large motions, leading to performance degradation. To address this, we introduce temporal layer-adaptive optimization, incorporating methods such as temporal layer-adaptive quality scaling (TAQS) and temporal layer-adaptive latent scaling (TALS). The final model with the proposed methods achieves an impressive BD-rate gain of -39.86% against the baseline. It also resolves the challenges in sequences with large or complex motions with up to -49.13% more BD-rate gains than the simple bidirectional extension. This improvement is attributed to the allocation of more bits to lower temporal layers, thereby enhancing overall reconstruction quality with smaller bits. Since our method has little dependency on a specific NVC model architecture, it can serve as a general tool for extending unidirectional NVC models to the ones with hierarchical B-frame coding.

Via

Access Paper or Ask Questions