Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ioannis Katsavounidis

AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content

Sep 25, 2024

Marcos V Conde, Zhijun Lei, Wen Li, Christos Bampis, Ioannis Katsavounidis, Radu Timofte

Figure 1 for AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content

Figure 2 for AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content

Figure 3 for AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content

Figure 4 for AIM 2024 Challenge on Efficient Video Super-Resolution for AV1 Compressed Content

Abstract:Video super-resolution (VSR) is a critical task for enhancing low-bitrate and low-resolution videos, particularly in streaming applications. While numerous solutions have been developed, they often suffer from high computational demands, resulting in low frame rates (FPS) and poor power efficiency, especially on mobile platforms. In this work, we compile different methods to address these challenges, the solutions are end-to-end real-time video super-resolution frameworks optimized for both high performance and low runtime. We also introduce a new test set of high-quality 4K videos to further validate the approaches. The proposed solutions tackle video up-scaling for two applications: 540p to 4K (x4) as a general case, and 360p to 1080p (x3) more tailored towards mobile devices. In both tracks, the solutions have a reduced number of parameters and operations (MACs), allow high FPS, and improve VMAF and PSNR over interpolation baselines. This report gauges some of the most efficient video super-resolution methods to date.

* European Conference on Computer Vision (ECCV) 2024 - Advances in Image Manipulation (AIM)

Via

Access Paper or Ask Questions

SVT-AV1 Encoding Bitrate Estimation Using Motion Search Information

Jul 08, 2024

Lena Eichermüller, Gaurang Chaudhari, Ioannis Katsavounidis, Zhijun Lei, Hassene Tmar, Christian Herglotz, André Kaup

Figure 1 for SVT-AV1 Encoding Bitrate Estimation Using Motion Search Information

Figure 2 for SVT-AV1 Encoding Bitrate Estimation Using Motion Search Information

Figure 3 for SVT-AV1 Encoding Bitrate Estimation Using Motion Search Information

Figure 4 for SVT-AV1 Encoding Bitrate Estimation Using Motion Search Information

Abstract:Enabling high compression efficiency while keeping encoding energy consumption at a low level, requires prioritization of which videos need more sophisticated encoding techniques. However, the effects vary highly based on the content, and information on how good a video can be compressed is required. This can be measured by estimating the encoded bitstream size prior to encoding. We identified the errors between estimated motion vectors from Motion Search, an algorithm that predicts temporal changes in videos, correlates well to the encoded bitstream size. Combining Motion Search with Random Forests, the encoding bitrate can be estimated with a Pearson correlation of above 0.96.

* 5 pages, 4 figures, accepted for European Signal Processing Conference (EUSIPCO) 2024

Via

Access Paper or Ask Questions

Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Apr 25, 2024

Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun(+65 more)

Figure 1 for Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Figure 2 for Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Figure 3 for Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Figure 4 for Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Abstract:This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images.

* CVPR 2024, AI for Streaming (AIS) Workshop

Via

Access Paper or Ask Questions

Cut-FUNQUE: An Objective Quality Model for Compressed Tone-Mapped High Dynamic Range Videos

Apr 20, 2024

Abhinau K. Venkataramanan, Cosmin Stejerean, Ioannis Katsavounidis, Hassene Tmar, Alan C. Bovik

Figure 1 for Cut-FUNQUE: An Objective Quality Model for Compressed Tone-Mapped High Dynamic Range Videos

Figure 2 for Cut-FUNQUE: An Objective Quality Model for Compressed Tone-Mapped High Dynamic Range Videos

Figure 3 for Cut-FUNQUE: An Objective Quality Model for Compressed Tone-Mapped High Dynamic Range Videos

Figure 4 for Cut-FUNQUE: An Objective Quality Model for Compressed Tone-Mapped High Dynamic Range Videos

Abstract:High Dynamic Range (HDR) videos have enjoyed a surge in popularity in recent years due to their ability to represent a wider range of contrast and color than Standard Dynamic Range (SDR) videos. Although HDR video capture has seen increasing popularity because of recent flagship mobile phones such as Apple iPhones, Google Pixels, and Samsung Galaxy phones, a broad swath of consumers still utilize legacy SDR displays that are unable to display HDR videos. As result, HDR videos must be processed, i.e., tone-mapped, before streaming to a large section of SDR-capable video consumers. However, server-side tone-mapping involves automating decisions regarding the choices of tone-mapping operators (TMOs) and their parameters to yield high-fidelity outputs. Moreover, these choices must be balanced against the effects of lossy compression, which is ubiquitous in streaming scenarios. In this work, we develop a novel, efficient model of objective video quality named Cut-FUNQUE that is able to accurately predict the visual quality of tone-mapped and compressed HDR videos. Finally, we evaluate Cut-FUNQUE on a large-scale crowdsourced database of such videos and show that it achieves state-of-the-art accuracy.

Via

Access Paper or Ask Questions

Joint Quality Assessment and Example-Guided Image Processing by Disentangling Picture Appearance from Content

Apr 20, 2024

Abhinau K. Venkataramanan, Cosmin Stejerean, Ioannis Katsavounidis, Hassene Tmar, Alan C. Bovik

Figure 1 for Joint Quality Assessment and Example-Guided Image Processing by Disentangling Picture Appearance from Content

Figure 2 for Joint Quality Assessment and Example-Guided Image Processing by Disentangling Picture Appearance from Content

Figure 3 for Joint Quality Assessment and Example-Guided Image Processing by Disentangling Picture Appearance from Content

Figure 4 for Joint Quality Assessment and Example-Guided Image Processing by Disentangling Picture Appearance from Content

Abstract:The deep learning revolution has strongly impacted low-level image processing tasks such as style/domain transfer, enhancement/restoration, and visual quality assessments. Despite often being treated separately, the aforementioned tasks share a common theme of understanding, editing, or enhancing the appearance of input images without modifying the underlying content. We leverage this observation to develop a novel disentangled representation learning method that decomposes inputs into content and appearance features. The model is trained in a self-supervised manner and we use the learned features to develop a new quality prediction model named DisQUE. We demonstrate through extensive evaluations that DisQUE achieves state-of-the-art accuracy across quality prediction tasks and distortion types. Moreover, we demonstrate that the same features may also be used for image processing tasks such as HDR tone mapping, where the desired output characteristics may be tuned using example input-output pairs.

Via

Access Paper or Ask Questions

Encoding Time and Energy Model for SVT-AV1 based on Video Complexity

Jan 30, 2024

Lena Eichermüller, Gaurang Chaudhari, Ioannis Katsavounidis, Zhijun Lei, Hassene Tmar, Christian Herglotz, André Kaup

Abstract:The share of online video traffic in global carbon dioxide emissions is growing steadily. To comply with the demand for video media, dedicated compression techniques are continuously optimized, but at the expense of increasingly higher computational demands and thus rising energy consumption at the video encoder side. In order to find the best trade-off between compression and energy consumption, modeling encoding energy for a wide range of encoding parameters is crucial. We propose an encoding time and energy model for SVT-AV1 based on empirical relations between the encoding time and video parameters as well as encoder configurations. Furthermore, we model the influence of video content by established content descriptors such as spatial and temporal information. We then use the predicted encoding time to estimate the required energy demand and achieve a prediction error of 19.6 % for encoding time and 20.9 % for encoding energy.

* 5 pages, 1 figure, accepted for IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024

Via

Access Paper or Ask Questions

A FUNQUE Approach to the Quality Assessment of Compressed HDR Videos

Dec 13, 2023

Abhinau K. Venkataramanan, Cosmin Stejerean, Ioannis Katsavounidis, Alan C. Bovik

Figure 1 for A FUNQUE Approach to the Quality Assessment of Compressed HDR Videos

Figure 2 for A FUNQUE Approach to the Quality Assessment of Compressed HDR Videos

Figure 3 for A FUNQUE Approach to the Quality Assessment of Compressed HDR Videos

Abstract:Recent years have seen steady growth in the popularity and availability of High Dynamic Range (HDR) content, particularly videos, streamed over the internet. As a result, assessing the subjective quality of HDR videos, which are generally subjected to compression, is of increasing importance. In particular, we target the task of full-reference quality assessment of compressed HDR videos. The state-of-the-art (SOTA) approach HDRMAX involves augmenting off-the-shelf video quality models, such as VMAF, with features computed on non-linearly transformed video frames. However, HDRMAX increases the computational complexity of models like VMAF. Here, we show that an efficient class of video quality prediction models named FUNQUE+ achieves SOTA accuracy. This shows that the FUNQUE+ models are flexible alternatives to VMAF that achieve higher HDR video quality prediction accuracy at lower computational cost.

Via

Access Paper or Ask Questions

Bitrate Ladder Construction using Visual Information Fidelity

Dec 12, 2023

Krishna Srikar Durbha, Hassene Tmar, Cosmin Stejerean, Ioannis Katsavounidis, Alan C. Bovik

Figure 1 for Bitrate Ladder Construction using Visual Information Fidelity

Figure 2 for Bitrate Ladder Construction using Visual Information Fidelity

Figure 3 for Bitrate Ladder Construction using Visual Information Fidelity

Figure 4 for Bitrate Ladder Construction using Visual Information Fidelity

Abstract:Recently proposed perceptually optimized per-title video encoding methods provide better BD-rate savings than fixed bitrate-ladder approaches that have been employed in the past. However, a disadvantage of per-title encoding is that it requires significant time and energy to compute bitrate ladders. Over the past few years, a variety of methods have been proposed to construct optimal bitrate ladders including using low-level features to predict cross-over bitrates, optimal resolutions for each bitrate, predicting visual quality, etc. Here, we deploy features drawn from Visual Information Fidelity (VIF) (VIF features) extracted from uncompressed videos to predict the visual quality (VMAF) of compressed videos. We present multiple VIF feature sets extracted from different scales and subbands of a video to tackle the problem of bitrate ladder construction. Comparisons are made against a fixed bitrate ladder and a bitrate ladder obtained from exhaustive encoding using Bjontegaard delta metrics.

* 4 pages, 4 figures, 2 tables

Via

Access Paper or Ask Questions

Encoder Complexity Control in SVT-AV1 by Speed-Adaptive Preset Switching

Jul 11, 2023

Lena Eichermüller, Gaurang Chaudhari, Ioannis Katsavounidis, Zhijun Lei, Hassene Tmar, André Kaup, Christian Herglotz

Figure 1 for Encoder Complexity Control in SVT-AV1 by Speed-Adaptive Preset Switching

Figure 2 for Encoder Complexity Control in SVT-AV1 by Speed-Adaptive Preset Switching

Figure 3 for Encoder Complexity Control in SVT-AV1 by Speed-Adaptive Preset Switching

Abstract:Current developments in video encoding technology lead to continuously improving compression performance but at the expense of increasingly higher computational demands. Regarding the online video traffic increases during the last years and the concomitant need for video encoding, encoder complexity control mechanisms are required to restrict the processing time to a sufficient extent in order to find a reasonable trade-off between performance and complexity. We present a complexity control mechanism in SVT-AV1 by using speed-adaptive preset switching to comply with the remaining time budget. This method enables encoding with a user-defined time constraint within the complete preset range with an average precision of 8.9 \% without introducing any additional latencies.

* 5 pages, 2 figures, accepted for IEEE International Conference on Image Processing (ICIP) 2023

Via

Access Paper or Ask Questions

Study of Subjective and Objective Quality Assessment of Mobile Cloud Gaming Videos

May 26, 2023

Avinab Saha, Yu-Chih Chen, Chase Davis, Bo Qiu, Xiaoming Wang, Rahul Gowda, Ioannis Katsavounidis, Alan C. Bovik

Abstract:We present the outcomes of a recent large-scale subjective study of Mobile Cloud Gaming Video Quality Assessment (MCG-VQA) on a diverse set of gaming videos. Rapid advancements in cloud services, faster video encoding technologies, and increased access to high-speed, low-latency wireless internet have all contributed to the exponential growth of the Mobile Cloud Gaming industry. Consequently, the development of methods to assess the quality of real-time video feeds to end-users of cloud gaming platforms has become increasingly important. However, due to the lack of a large-scale public Mobile Cloud Gaming Video dataset containing a diverse set of distorted videos with corresponding subjective scores, there has been limited work on the development of MCG-VQA models. Towards accelerating progress towards these goals, we created a new dataset, named the LIVE-Meta Mobile Cloud Gaming (LIVE-Meta-MCG) video quality database, composed of 600 landscape and portrait gaming videos, on which we collected 14,400 subjective quality ratings from an in-lab subjective study. Additionally, to demonstrate the usefulness of the new resource, we benchmarked multiple state-of-the-art VQA algorithms on the database. The new database will be made publicly available on our website: \url{https://live.ece.utexas.edu/research/LIVE-Meta-Mobile-Cloud-Gaming/index.html}

* Accepted to IEEE Transactions on Image Processing, 2023. The database will be publicly available by 1st week of July 2023

Via

Access Paper or Ask Questions