Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mateen Ulhaq

Feature Compression for Machines with Range-Based Channel Truncation and Frame Packing

Dec 11, 2025

Juan Merlos, Fabien Racapé, Hyomin Choi, Mateen Ulhaq, Hari Kalva

Abstract:This paper proposes a method that enhances the compression performance of the current model under development for the upcoming MPEG standard on Feature Coding for Machines (FCM). This standard aims at providing inter-operable compressed bitstreams of features in the context of split computing, i.e., when the inference of a large computer vision neural-network (NN)-based model is split between two devices. Intermediate features can consist of multiple 3D tensors that can be reduced and entropy coded to limit the required bandwidth of such transmission. In the envisioned design for the MPEG-FCM standard, intermediate feature tensors may be reduced using Neural layers before being converted into 2D video frames that can be coded using existing video compression standards. This paper introduces an additional channel truncation and packing method which enables the system to preserve the relevant channels, depending on the statistics of the features at inference time, while preserving the computer vision task performance at the receiver. Implemented within the MPEG-FCM test model, the proposed method yields an average reduction in rate by 10.59% for a given accuracy on multiple computer vision tasks and datasets.

* 2025 Data Compression Conference (DCC), Snowbird, UT, USA, 2025, pp. 392-392
* 10 pages, 8 figures. Extended version of the paper with the same title presented at IEEE DCC 2025

Via

Access Paper or Ask Questions

Efficient Feature Compression for Machines with Global Statistics Preservation

Dec 10, 2025

Md Eimran Hossain Eimon, Hyomin Choi, Fabien Racapé, Mateen Ulhaq, Velibor Adzic, Hari Kalva, Borko Furht

Figure 1 for Efficient Feature Compression for Machines with Global Statistics Preservation

Figure 2 for Efficient Feature Compression for Machines with Global Statistics Preservation

Figure 3 for Efficient Feature Compression for Machines with Global Statistics Preservation

Figure 4 for Efficient Feature Compression for Machines with Global Statistics Preservation

Abstract:The split-inference paradigm divides an artificial intelligence (AI) model into two parts. This necessitates the transfer of intermediate feature data between the two halves. Here, effective compression of the feature data becomes vital. In this paper, we employ Z-score normalization to efficiently recover the compressed feature data at the decoder side. To examine the efficacy of our method, the proposed method is integrated into the latest Feature Coding for Machines (FCM) codec standard under development by the Moving Picture Experts Group (MPEG). Our method supersedes the existing scaling method used by the current standard under development. It both reduces the overhead bits and improves the end-task accuracy. To further reduce the overhead in certain circumstances, we also propose a simplified method. Experiments show that using our proposed method shows 17.09% reduction in bitrate on average across different tasks and up to 65.69% for object tracking without sacrificing the task accuracy.

* 2025 IEEE International Symposium on Circuits and Systems (ISCAS), London, United Kingdom, 2025, pp. 1-5

Via

Access Paper or Ask Questions

Learned Compression for Images and Point Clouds

Sep 12, 2024

Mateen Ulhaq

Abstract:Over the last decade, deep learning has shown great success at performing computer vision tasks, including classification, super-resolution, and style transfer. Now, we apply it to data compression to help build the next generation of multimedia codecs. This thesis provides three primary contributions to this new field of learned compression. First, we present an efficient low-complexity entropy model that dynamically adapts the encoding distribution to a specific input by compressing and transmitting the encoding distribution itself as side information. Secondly, we propose a novel lightweight low-complexity point cloud codec that is highly specialized for classification, attaining significant reductions in bitrate compared to non-specialized codecs. Lastly, we explore how motion within the input domain between consecutive video frames is manifested in the corresponding convolutionally-derived latent space.

* 65 pages, 21 figures, Master's Thesis, defended in 2023

Via

Access Paper or Ask Questions

Learned Compression of Encoding Distributions

Jun 18, 2024

Mateen Ulhaq, Ivan V. Bajić

Abstract:The entropy bottleneck introduced by Ball\'e et al. is a common component used in many learned compression models. It encodes a transformed latent representation using a static distribution whose parameters are learned during training. However, the actual distribution of the latent data may vary wildly across different inputs. The static distribution attempts to encompass all possible input distributions, thus fitting none of them particularly well. This unfortunate phenomenon, sometimes known as the amortization gap, results in suboptimal compression. To address this issue, we propose a method that dynamically adapts the encoding distribution to match the latent data distribution for a specific input. First, our model estimates a better encoding distribution for a given input. This distribution is then compressed and transmitted as an additional side-information bitstream. Finally, the decoder reconstructs the encoding distribution and uses it to decompress the corresponding latent data. Our method achieves a Bj{\o}ntegaard-Delta (BD)-rate gain of -7.10% on the Kodak test dataset when applied to the standard fully-factorized architecture. Furthermore, considering computational complexity, the transform used by our method is an order of magnitude cheaper in terms of Multiply-Accumulate (MAC) operations compared to related side-information methods such as the scale hyperprior.

* 7 pages, 5 figures, IEEE ICIP 2024

Via

Access Paper or Ask Questions

Scalable Human-Machine Point Cloud Compression

Feb 23, 2024

Mateen Ulhaq, Ivan V. Bajić

Abstract:Due to the limited computational capabilities of edge devices, deep learning inference can be quite expensive. One remedy is to compress and transmit point cloud data over the network for server-side processing. Unfortunately, this approach can be sensitive to network factors, including available bitrate. Luckily, the bitrate requirements can be reduced without sacrificing inference accuracy by using a machine task-specialized codec. In this paper, we present a scalable codec for point-cloud data that is specialized for the machine task of classification, while also providing a mechanism for human viewing. In the proposed scalable codec, the "base" bitstream supports the machine task, and an "enhancement" bitstream may be used for better input reconstruction performance for human viewing. We base our architecture on PointNet++, and test its efficacy on the ModelNet40 dataset. We show significant improvements over prior non-specialized codecs.

* 5 pages, 4 figures, 2024 Picture Coding Symposium (PCS)

Via

Access Paper or Ask Questions

Learned Point Cloud Compression for Classification

Aug 11, 2023

Mateen Ulhaq, Ivan V. Bajić

Abstract:Deep learning is increasingly being used to perform machine vision tasks such as classification, object detection, and segmentation on 3D point cloud data. However, deep learning inference is computationally expensive. The limited computational capabilities of end devices thus necessitate a codec for transmitting point cloud data over the network for server-side processing. Such a codec must be lightweight and capable of achieving high compression ratios without sacrificing accuracy. Motivated by this, we present a novel point cloud codec that is highly specialized for the machine task of classification. Our codec, based on PointNet, achieves a significantly better rate-accuracy trade-off in comparison to alternative methods. In particular, it achieves a 94% reduction in BD-bitrate over non-specialized codecs on the ModelNet40 dataset. For low-resource end devices, we also propose two lightweight configurations of our encoder that achieve similar BD-bitrate reductions of 93% and 92% with 3% and 5% drops in top-1 accuracy, while consuming only 0.470 and 0.048 encoder-side kMACs/point, respectively. Our codec demonstrates the potential of specialized codecs for machine analysis of point clouds, and provides a basis for extension to more complex tasks and datasets in the future.

* 6 pages, 4 figures, IEEE MMSP 2023

Via

Access Paper or Ask Questions

Mobile-Cloud Inference for Collaborative Intelligence

Jun 24, 2023

Mateen Ulhaq

Abstract:As AI applications for mobile devices become more prevalent, there is an increasing need for faster execution and lower energy consumption for deep learning model inference. Historically, the models run on mobile devices have been smaller and simpler in comparison to large state-of-the-art research models, which can only run on the cloud. However, cloud-only inference has drawbacks such as increased network bandwidth consumption and higher latency. In addition, cloud-only inference requires the input data (images, audio) to be fully transferred to the cloud, creating concerns about potential privacy breaches. There is an alternative approach: shared mobile-cloud inference. Partial inference is performed on the mobile in order to reduce the dimensionality of the input data and arrive at a compact feature tensor, which is a latent space representation of the input signal. The feature tensor is then transmitted to the server for further inference. This strategy can reduce inference latency, energy consumption, and network bandwidth usage, as well as provide privacy protection, because the original signal never leaves the mobile. Further performance gain can be achieved by compressing the feature tensor before its transmission.

* 56 pages, 20 figures, Bachelor's Thesis, defended in 2020

Via

Access Paper or Ask Questions

Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Jan 10, 2023

Ezgi Ozyilkan, Mateen Ulhaq, Hyomin Choi, Fabien Racape

Figure 1 for Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Figure 2 for Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Figure 3 for Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Figure 4 for Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Abstract:As an increasing amount of image and video content will be analyzed by machines, there is demand for a new codec paradigm that is capable of compressing visual input primarily for the purpose of computer vision inference, while secondarily supporting input reconstruction. In this work, we propose a learned compression architecture that can be used to build such a codec. We introduce a novel variational formulation that explicitly takes feature data relevant to the desired inference task as input at the encoder side. As such, our learned scalable image codec encodes and transmits two disentangled latent representations for object detection and input reconstruction. We note that compared to relevant benchmarks, our proposed scheme yields a more compact latent representation that is specialized for the inference task. Our experiments show that our proposed system achieves a bit rate savings of 40.6% on the primary object detection task compared to the current state-of-the-art, albeit with some degradation in performance for the secondary input reconstruction task.

* accepted as a paper for DCC 2023

Via

Access Paper or Ask Questions

Frequency-aware Learned Image Compression for Quality Scalability

Jan 03, 2023

Hyomin Choi, Fabien Racape, Shahab Hamidi-Rad, Mateen Ulhaq, Simon Feltman

Figure 1 for Frequency-aware Learned Image Compression for Quality Scalability

Figure 2 for Frequency-aware Learned Image Compression for Quality Scalability

Figure 3 for Frequency-aware Learned Image Compression for Quality Scalability

Figure 4 for Frequency-aware Learned Image Compression for Quality Scalability

Abstract:Spatial frequency analysis and transforms serve a central role in most engineered image and video lossy codecs, but are rarely employed in neural network (NN)-based approaches. We propose a novel NN-based image coding framework that utilizes forward wavelet transforms to decompose the input signal by spatial frequency. Our encoder generates separate bitstreams for each latent representation of low and high frequencies. This enables our decoder to selectively decode bitstreams in a quality-scalable manner. Hence, the decoder can produce an enhanced image by using an enhancement bitstream in addition to the base bitstream. Furthermore, our method is able to enhance only a specific region of interest (ROI) by using a corresponding part of the enhancement latent representation. Our experiments demonstrate that the proposed method shows competitive rate-distortion performance compared to several non-scalable image codecs. We also showcase the effectiveness of our two-level quality scalability, as well as its practicality in ROI quality enhancement.

* Presented at VCIP'22

Via

Access Paper or Ask Questions

Joint Image Compression and Denoising via Latent-Space Scalability

May 04, 2022

Saeed Ranjbar Alvar, Mateen Ulhaq, Hyomin Choi, Ivan V. Bajić

Figure 1 for Joint Image Compression and Denoising via Latent-Space Scalability

Figure 2 for Joint Image Compression and Denoising via Latent-Space Scalability

Figure 3 for Joint Image Compression and Denoising via Latent-Space Scalability

Figure 4 for Joint Image Compression and Denoising via Latent-Space Scalability

Abstract:When it comes to image compression in digital cameras, denoising is traditionally performed prior to compression. However, there are applications where image noise may be necessary to demonstrate the trustworthiness of the image, such as court evidence and image forensics. This means that noise itself needs to be coded, in addition to the clean image itself. In this paper, we present a learnt image compression framework where image denoising and compression are performed jointly. The latent space of the image codec is organized in a scalable manner such that the clean image can be decoded from a subset of the latent space at a lower rate, while the noisy image is decoded from the full latent space at a higher rate. The proposed codec is compared against established compression and denoising benchmarks, and the experiments reveal considerable bitrate savings of up to 80% compared to cascade compression and denoising.

Via

Access Paper or Ask Questions