Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hyomin Choi

Semantics-Guided Generative Image Compression

May 29, 2025

Cheng-Lin Wu, Hyomin Choi, Ivan V. Bajić

Abstract:Advancements in text-to-image generative AI with large multimodal models are spreading into the field of image compression, creating high-quality representation of images at extremely low bit rates. This work introduces novel components to the existing multimodal image semantic compression (MISC) approach, enhancing the quality of the generated images in terms of PSNR and perceptual metrics. The new components include semantic segmentation guidance for the generative decoder, as well as content-adaptive diffusion, which controls the number of diffusion steps based on image characteristics. The results show that our newly introduced methods significantly improve the baseline MISC model while also decreasing the complexity. As a result, both the encoding and decoding time are reduced by more than 36%. Moreover, the proposed compression framework outperforms mainstream codecs in terms of perceptual similarity and quality. The code and visual examples are available.

* 6 pages, 4 figures, IEEE ICIP 2025

Via

Access Paper or Ask Questions

A Scalable Crawling Algorithm Utilizing Noisy Change-Indicating Signals

Feb 04, 2025

Róbert Busa-Fekete, Julian Zimmert, András György, Linhai Qiu, Tzu-Wei Sung, Hao Shen, Hyomin Choi, Sharmila Subramaniam, Li Xiao

Abstract:Web refresh crawling is the problem of keeping a cache of web pages fresh, that is, having the most recent copy available when a page is requested, given a limited bandwidth available to the crawler. Under the assumption that the change and request events, resp., to each web page follow independent Poisson processes, the optimal scheduling policy was derived by Azar et al. 2018. In this paper, we study an extension of this problem where side information indicating content changes, such as various types of web pings, for example, signals from sitemaps, content delivery networks, etc., is available. Incorporating such side information into the crawling policy is challenging, because (i) the signals can be noisy with false positive events and with missing change events; and (ii) the crawler should achieve a fair performance over web pages regardless of the quality of the side information, which might differ from web page to web page. We propose a scalable crawling algorithm which (i) uses the noisy side information in an optimal way under mild assumptions; (ii) can be deployed without heavy centralized computation; (iii) is able to crawl web pages at a constant total rate without spikes in the total bandwidth usage over any time interval, and automatically adapt to the new optimal solution when the total bandwidth changes without centralized computation. Experiments clearly demonstrate the versatility of our approach.

Via

Access Paper or Ask Questions

Variable-Rate Learned Image Compression with Multi-Objective Optimization and Quantization-Reconstruction Offsets

Feb 29, 2024

Fatih Kamisli, Fabien Racape, Hyomin Choi

Abstract:Achieving successful variable bitrate compression with computationally simple algorithms from a single end-to-end learned image or video compression model remains a challenge. Many approaches have been proposed, including conditional auto-encoders, channel-adaptive gains for the latent tensor or uniformly quantizing all elements of the latent tensor. This paper follows the traditional approach to vary a single quantization step size to perform uniform quantization of all latent tensor elements. However, three modifications are proposed to improve the variable rate compression performance. First, multi objective optimization is used for (post) training. Second, a quantization-reconstruction offset is introduced into the quantization operation. Third, variable rate quantization is used also for the hyper latent. All these modifications can be made on a pre-trained single-rate compression model by performing post training. The algorithms are implemented into three well-known image compression models and the achieved variable rate compression results indicate negligible or minimal compression performance loss compared to training multiple models. (Codes will be shared at https://github.com/InterDigitalInc/CompressAI)

* Accepted as a paper at DCC 2024

Via

Access Paper or Ask Questions

Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Jan 10, 2023

Ezgi Ozyilkan, Mateen Ulhaq, Hyomin Choi, Fabien Racape

Figure 1 for Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Figure 2 for Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Figure 3 for Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Figure 4 for Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Abstract:As an increasing amount of image and video content will be analyzed by machines, there is demand for a new codec paradigm that is capable of compressing visual input primarily for the purpose of computer vision inference, while secondarily supporting input reconstruction. In this work, we propose a learned compression architecture that can be used to build such a codec. We introduce a novel variational formulation that explicitly takes feature data relevant to the desired inference task as input at the encoder side. As such, our learned scalable image codec encodes and transmits two disentangled latent representations for object detection and input reconstruction. We note that compared to relevant benchmarks, our proposed scheme yields a more compact latent representation that is specialized for the inference task. Our experiments show that our proposed system achieves a bit rate savings of 40.6% on the primary object detection task compared to the current state-of-the-art, albeit with some degradation in performance for the secondary input reconstruction task.

* accepted as a paper for DCC 2023

Via

Access Paper or Ask Questions

Frequency-aware Learned Image Compression for Quality Scalability

Jan 03, 2023

Hyomin Choi, Fabien Racape, Shahab Hamidi-Rad, Mateen Ulhaq, Simon Feltman

Abstract:Spatial frequency analysis and transforms serve a central role in most engineered image and video lossy codecs, but are rarely employed in neural network (NN)-based approaches. We propose a novel NN-based image coding framework that utilizes forward wavelet transforms to decompose the input signal by spatial frequency. Our encoder generates separate bitstreams for each latent representation of low and high frequencies. This enables our decoder to selectively decode bitstreams in a quality-scalable manner. Hence, the decoder can produce an enhanced image by using an enhancement bitstream in addition to the base bitstream. Furthermore, our method is able to enhance only a specific region of interest (ROI) by using a corresponding part of the enhancement latent representation. Our experiments demonstrate that the proposed method shows competitive rate-distortion performance compared to several non-scalable image codecs. We also showcase the effectiveness of our two-level quality scalability, as well as its practicality in ROI quality enhancement.

* Presented at VCIP'22

Via

Access Paper or Ask Questions

Scalable Video Coding for Humans and Machines

Aug 04, 2022

Hyomin Choi, Ivan V. Bajić

Figure 1 for Scalable Video Coding for Humans and Machines

Figure 2 for Scalable Video Coding for Humans and Machines

Figure 3 for Scalable Video Coding for Humans and Machines

Figure 4 for Scalable Video Coding for Humans and Machines

Abstract:Video content is watched not only by humans, but increasingly also by machines. For example, machine learning models analyze surveillance video for security and traffic monitoring, search through YouTube videos for inappropriate content, and so on. In this paper, we propose a scalable video coding framework that supports machine vision (specifically, object detection) through its base layer bitstream and human vision via its enhancement layer bitstream. The proposed framework includes components from both conventional and Deep Neural Network (DNN)-based video coding. The results show that on object detection, the proposed framework achieves 13-19% bit savings compared to state-of-the-art video codecs, while remaining competitive in terms of MS-SSIM on the human vision task.

* 6 pages, 5 figures, IEEE MMSP 2022

Via

Access Paper or Ask Questions

Joint Image Compression and Denoising via Latent-Space Scalability

May 04, 2022

Saeed Ranjbar Alvar, Mateen Ulhaq, Hyomin Choi, Ivan V. Bajić

Figure 1 for Joint Image Compression and Denoising via Latent-Space Scalability

Figure 2 for Joint Image Compression and Denoising via Latent-Space Scalability

Figure 3 for Joint Image Compression and Denoising via Latent-Space Scalability

Figure 4 for Joint Image Compression and Denoising via Latent-Space Scalability

Abstract:When it comes to image compression in digital cameras, denoising is traditionally performed prior to compression. However, there are applications where image noise may be necessary to demonstrate the trustworthiness of the image, such as court evidence and image forensics. This means that noise itself needs to be coded, in addition to the clean image itself. In this paper, we present a learnt image compression framework where image denoising and compression are performed jointly. The latent space of the image codec is organized in a scalable manner such that the clean image can be decoded from a subset of the latent space at a lower rate, while the noisy image is decoded from the full latent space at a higher rate. The proposed codec is compared against established compression and denoising benchmarks, and the experiments reveal considerable bitrate savings of up to 80% compared to cascade compression and denoising.

Via

Access Paper or Ask Questions

SFU-HW-Tracks-v1: Object Tracking Dataset on Raw Video Sequences

Dec 30, 2021

Takehiro Tanaka, Hyomin Choi, Ivan V. Bajić

Figure 1 for SFU-HW-Tracks-v1: Object Tracking Dataset on Raw Video Sequences

Figure 2 for SFU-HW-Tracks-v1: Object Tracking Dataset on Raw Video Sequences

Figure 3 for SFU-HW-Tracks-v1: Object Tracking Dataset on Raw Video Sequences

Figure 4 for SFU-HW-Tracks-v1: Object Tracking Dataset on Raw Video Sequences

Abstract:We present a dataset that contains object annotations with unique object identities (IDs) for the High Efficiency Video Coding (HEVC) v1 Common Test Conditions (CTC) sequences. Ground-truth annotations for 13 sequences were prepared and released as the dataset called SFU-HW-Tracks-v1. For each video frame, ground truth annotations include object class ID, object ID, and bounding box location and its dimensions. The dataset can be used to evaluate object tracking performance on uncompressed video sequences and study the relationship between video compression and object tracking.

* 4 pages, 3 figures, submitted to Data in Brief

Via

Access Paper or Ask Questions

Scalable Image Coding for Humans and Machines

Jul 18, 2021

Hyomin Choi, Ivan V. Bajic

Figure 1 for Scalable Image Coding for Humans and Machines

Figure 2 for Scalable Image Coding for Humans and Machines

Figure 3 for Scalable Image Coding for Humans and Machines

Figure 4 for Scalable Image Coding for Humans and Machines

Abstract:At present, and increasingly so in the future, much of the captured visual content will not be seen by humans. Instead, it will be used for automated machine vision analytics and may require occasional human viewing. Examples of such applications include traffic monitoring, visual surveillance, autonomous navigation, and industrial machine vision. To address such requirements, we develop an end-to-end learned image codec whose latent space is designed to support scalability from simpler to more complicated tasks. The simplest task is assigned to a subset of the latent space (the base layer), while more complicated tasks make use of additional subsets of the latent space, i.e., both the base and enhancement layer(s). For the experiments, we establish a 2-layer and a 3-layer model, each of which offers input reconstruction for human vision, plus machine vision task(s), and compare them with relevant benchmarks. The experiments show that our scalable codecs offer 37%-80% bitrate savings on machine vision tasks compared to best alternatives, while being comparable to state-of-the-art image codecs in terms of input reconstruction.

* Submitted for peer review to IEEE Transactions

Via

Access Paper or Ask Questions

Latent-space scalability for multi-task collaborative intelligence

May 21, 2021

Hyomin Choi, Ivan V. Bajic

Figure 1 for Latent-space scalability for multi-task collaborative intelligence

Figure 2 for Latent-space scalability for multi-task collaborative intelligence

Figure 3 for Latent-space scalability for multi-task collaborative intelligence

Figure 4 for Latent-space scalability for multi-task collaborative intelligence

Abstract:We investigate latent-space scalability for multi-task collaborative intelligence, where one of the tasks is object detection and the other is input reconstruction. In our proposed approach, part of the latent space can be selectively decoded to support object detection while the remainder can be decoded when input reconstruction is needed. Such an approach allows reduced computational resources when only object detection is required, and this can be achieved without reconstructing input pixels. By varying the scaling factors of various terms in the training loss function, the system can be trained to achieve various trade-offs between object detection accuracy and input reconstruction quality. Experiments are conducted to demonstrate the adjustable system performance on the two tasks compared to the relevant benchmarks.

* To be presented in IEEE ICIP'21

Via

Access Paper or Ask Questions