Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

André Kaup

Friedrich-Alexander Universität Erlangen-Nürnberg

LoC-LIC: Low Complexity Learned Image Coding Using Hierarchical Feature Transforms

Apr 30, 2025

Ayman A. Ameen, Thomas Richter, André Kaup

Abstract:Current learned image compression models typically exhibit high complexity, which demands significant computational resources. To overcome these challenges, we propose an innovative approach that employs hierarchical feature extraction transforms to significantly reduce complexity while preserving bit rate reduction efficiency. Our novel architecture achieves this by using fewer channels for high spatial resolution inputs/feature maps. On the other hand, feature maps with a large number of channels have reduced spatial dimensions, thereby cutting down on computational load without sacrificing performance. This strategy effectively reduces the forward pass complexity from $1256 \, \text{kMAC/Pixel}$ to just $270 \, \text{kMAC/Pixel}$. As a result, the reduced complexity model can open the way for learned image compression models to operate efficiently across various devices and pave the way for the development of new architectures in image compression technology.

Via

Access Paper or Ask Questions

Improved Motion Plane Adaptive 360-Degree Video Compression Using Affine Motion Models

Mar 29, 2025

Marina Ritthaler, Andy Regensky, André Kaup

Abstract:Efficient compression of 360-degree video content requires the application of advanced motion models for interframe prediction. The Motion Plane Adaptive (MPA) motion model projects the frames on multiple perspective planes in the 3D space. It improves the motion compensation by estimating the motion on those planes with a translational diamond search. In this work, we enhance this motion model with an affine parameterization and motion estimation method. Thereby, we find a feasible trade-off between the quality of the reconstructed frames and the computational cost. The affine motion estimation is hereby done with the inverse compositional Lucas-Kanade algorithm. With the proposed method, it is possible to improve the motion compensation significantly, so that the motion compensated frame has a Weighted-to-Spherically-uniform Peak Signal-to-Noise Ratio (WS-PSNR) which is about 1.6 dB higher than with the conventional MPA. In a basic video codec, the improved inter prediction can lead to Bj{\o}ntegaard Delta (BD) rate savings between 9 % and 35 % depending on the block size (BS) and number of motion parameters.

* ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Via

Access Paper or Ask Questions

OSLO-IC: On-the-Sphere Learned Omnidirectional Image Compression with Attention Modules and Spatial Context

Mar 17, 2025

Paul Wawerek-López, Navid Mahmoudian Bidgoli, Pascal Frossard, André Kaup, Thomas Maugey

Abstract:Developing effective 360-degree (spherical) image compression techniques is crucial for technologies like virtual reality and automated driving. This paper advances the state-of-the-art in on-the-sphere learning (OSLO) for omnidirectional image compression framework by proposing spherical attention modules, residual blocks, and a spatial autoregressive context model. These improvements achieve a 23.1% bit rate reduction in terms of WS-PSNR BD rate. Additionally, we introduce a spherical transposed convolution operator for upsampling, which reduces trainable parameters by a factor of four compared to the pixel shuffling used in the OSLO framework, while maintaining similar compression performance. Therefore, in total, our proposed method offers significant rate savings with a smaller architecture and can be applied to any spherical convolutional application.

* 5 pages, 5 figures, accepted for IEEE International Conference on Acoustics, Speech and Signal Processing 2025 (IEEE ICASSP 2025)

Via

Access Paper or Ask Questions

Compact Latent Representation for Image Compression (CLRIC)

Feb 20, 2025

Ayman A. Ameen, Thomas Richter, André Kaup

Abstract:Current image compression models often require separate models for each quality level, making them resource-intensive in terms of both training and storage. To address these limitations, we propose an innovative approach that utilizes latent variables from pre-existing trained models (such as the Stable Diffusion Variational Autoencoder) for perceptual image compression. Our method eliminates the need for distinct models dedicated to different quality levels. We employ overfitted learnable functions to compress the latent representation from the target model at any desired quality level. These overfitted functions operate in the latent space, ensuring low computational complexity, around $25.5$ MAC/pixel for a forward pass on images with dimensions $(1363 \times 2048)$ pixels. This approach efficiently utilizes resources during both training and decoding. Our method achieves comparable perceptual quality to state-of-the-art learned image compression models while being both model-agnostic and resolution-agnostic. This opens up new possibilities for the development of innovative image compression methods.

Via

Access Paper or Ask Questions

Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval

Nov 22, 2024

Zengbao Sun, Ming Zhao, Gaorui Liu, André Kaup

Figure 1 for Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval

Figure 2 for Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval

Figure 3 for Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval

Figure 4 for Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval

Abstract:Remote sensing cross-modal text-image retrieval (RSCTIR) has gained attention for its utility in information mining. However, challenges remain in effectively integrating global and local information due to variations in remote sensing imagery and ensuring proper feature pre-alignment before modal fusion, which affects retrieval accuracy and efficiency. To address these issues, we propose CMPAGL, a cross-modal pre-aligned method leveraging global and local information. Our Gswin transformer block combines local window self-attention and global-local window cross-attention to capture multi-scale features. A pre-alignment mechanism simplifies modal fusion training, improving retrieval performance. Additionally, we introduce a similarity matrix reweighting (SMR) algorithm for reranking, and enhance the triplet loss function with an intra-class distance term to optimize feature learning. Experiments on four datasets, including RSICD and RSITMD, validate CMPAGL's effectiveness, achieving up to 4.65% improvement in R@1 and 2.28% in mean Recall (mR) over state-of-the-art methods.

* IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1-18, 2024, Art no. 4709118

Via

Access Paper or Ask Questions

Multitask Learning for SAR Ship Detection with Gaussian-Mask Joint Segmentation

Nov 21, 2024

Ming Zhao, Xin Zhang, André Kaup

Abstract:Detecting ships in synthetic aperture radar (SAR) images is challenging due to strong speckle noise, complex surroundings, and varying scales. This paper proposes MLDet, a multitask learning framework for SAR ship detection, consisting of object detection, speckle suppression, and target segmentation tasks. An angle classification loss with aspect ratio weighting is introduced to improve detection accuracy by addressing angular periodicity and object proportions. The speckle suppression task uses a dual-feature fusion attention mechanism to reduce noise and fuse shallow and denoising features, enhancing robustness. The target segmentation task, leveraging a rotated Gaussian-mask, aids the network in extracting target regions from cluttered backgrounds and improves detection efficiency with pixel-level predictions. The Gaussian-mask ensures ship centers have the highest probabilities, gradually decreasing outward under a Gaussian distribution. Additionally, a weighted rotated boxes fusion (WRBF) strategy combines multi-direction anchor predictions, filtering anchors beyond boundaries or with high overlap but low confidence. Extensive experiments on SSDD+ and HRSID datasets demonstrate the effectiveness and superiority of MLDet.

* IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-16, 2023, Art no. 5214516

Via

Access Paper or Ask Questions

Inter-Camera Color Correction for Multispectral Imaging with Camera Arrays Using a Consensus Image

Oct 30, 2024

Katja Kossira, Jürgen Seiler, André Kaup

Abstract:This paper introduces a novel method for inter-camera color calibration for multispectral imaging with camera arrays using a consensus image. Capturing images using multispectral camera arrays has gained importance in medical, agricultural, and environmental processes. Due to fabrication differences, noise, or device altering, varying pixel sensitivities occur, influencing classification processes. Therefore, color calibration between the cameras is necessary. In existing methods, one of the camera images is chosen and considered as a reference, ignoring the color information of all other recordings. Our new approach does not just take one image as reference, but uses statistical information such as the location parameter to generate a consensus image as basis for calibration. This way, we managed to improve the PSNR values for the linear regression color correction algorithm by 1.15 dB and the improved color difference (iCID) values by 2.81.

Via

Access Paper or Ask Questions

Variable Rate Learned Wavelet Video Coding with Temporal Layer Adaptivity

Oct 21, 2024

Anna Meyer, André Kaup

Figure 1 for Variable Rate Learned Wavelet Video Coding with Temporal Layer Adaptivity

Figure 2 for Variable Rate Learned Wavelet Video Coding with Temporal Layer Adaptivity

Figure 3 for Variable Rate Learned Wavelet Video Coding with Temporal Layer Adaptivity

Figure 4 for Variable Rate Learned Wavelet Video Coding with Temporal Layer Adaptivity

Abstract:Learned wavelet video coders provide an explainable framework by performing discrete wavelet transforms in temporal, horizontal, and vertical dimensions. With a temporal transform based on motion-compensated temporal filtering (MCTF), spatial and temporal scalability is obtained. In this paper, we introduce variable rate support and a mechanism for quality adaption to different temporal layers for a higher coding efficiency. Moreover, we propose a multi-stage training strategy that allows training with multiple temporal layers. Our experiments demonstrate Bj{\o}ntegaard Delta bitrate savings of at least -17% compared to a learned MCTF model without these extensions. Our method also outperforms other learned video coders like DCVC-DC. Training and inference code is available at: https://github.com/FAU-LMS/Learned-pMCTF.

* 5 pages, 4 figures, submitted to ICASSP2025

Via

Access Paper or Ask Questions

Conditional Optimal Filter Selection for Multispectral Object Classification

Oct 02, 2024

Katja Kossira, David Schön, Jürgen Seiler, André Kaup

Abstract:Capturing images using multispectral camera arrays has gained importance in medical, agricultural and environmental processes. However, using all available spectral bands is infeasible and produces much data, while only a fraction is needed for a given task. Nearby bands may contain similar information, therefore redundant spectral bands should not be considered in the evaluation process to keep complexity and the data load low. In current methods, a restricted and pre-determined number of spectral bands is selected. Our approach improves this procedure by including preset conditions such as noise or the bandwidth of available filters, minimizing spectral redundancy. Furthermore, a minimal filter selection can be conducted, keeping the hardware setup at low costs, while still obtaining all important spectral information. In comparison to the fast binary search filter band selection method, we managed to reduce the amount of misclassified objects of the SMM dataset from 318 to 124 using a random forest classifier.

Via

Access Paper or Ask Questions

Design Space Exploration at Frame-Level for Joint Decoding Energy and Quality Optimization in VVC

Oct 01, 2024

Teresa Stürzenhofäcker, Matthias Kränzler, Christian Herglotz, André Kaup

Figure 1 for Design Space Exploration at Frame-Level for Joint Decoding Energy and Quality Optimization in VVC

Figure 2 for Design Space Exploration at Frame-Level for Joint Decoding Energy and Quality Optimization in VVC

Figure 3 for Design Space Exploration at Frame-Level for Joint Decoding Energy and Quality Optimization in VVC

Abstract:In the pursuit of a reduced energy demand of VVC decoders, it was found that the coding tool configuration has a substantial influence on the bit rate efficiency and the decoding energy demand. The Advanced Design Space Exploration algorithm as proposed in the literature, can derive coding tool configurations that provide optimal trade-offs between rate and energy efficiency. Yet, some trade-off points in the design space cannot be reached with the state-of-the-art methodology, which defines coding tools for an entire bitstream. This work proposes a novel, granular adjustment of the coding tool usage in VVC. Consequently, the optimization algorithm is adjusted to explore coding tool configurations that operate on frame-level. Moreover, new optimization criteria are introduced to focus the search on specific bit rates. As a result, coding tool configurations are obtained which yield so far inaccessible trade-offs between bit rate efficiency and decoding energy demand for VVC-coded sequences. The proposed methodology extends the design space and enhances the continuity of the Pareto front.

* EuSipCo 2024, ISBN: 978-9-4645-9361-7
* submitted, accepted and published at EuSipCo 2024, Special Session on Frugality for Video Streaming

Via

Access Paper or Ask Questions