Abstract:Adaptive video streaming allows for the construction of bitrate ladders that deliver perceptually optimized visual quality to viewers under bandwidth constraints. Two common approaches to adaptation are per-title encoding and per-shot encoding. The former involves encoding each program, movie, or other content in a manner that is perceptually- and bandwidth-optimized for that content but is otherwise fixed. The latter is a more granular approach that optimizes the encoding parameters for each scene or shot (however defined) of a video content. Per-shot video encoding, as pioneered by Netflix, encodes on a per-shot basis using the Dynamic Optimizer (DO). Under the control of the VMAF perceptual video quality prediction engine, the DO delivers high-quality videos to millions of viewers at considerably reduced bitrates than per-title or fixed bitrate ladder encoding. A variety of per-title and per-shot encoding techniques have been recently proposed that seek to reduce computational overhead and to construct optimal bitrate ladders more efficiently using low-level features extracted from source videos. Here we develop a perceptually optimized method of constructing optimal per-shot bitrate and quality ladders, using an ensemble of low-level features and Visual Information Fidelity (VIF) features extracted from different scales and subbands. We compare the performance of our model, which we call VIF-ladder, against other content-adaptive bitrate ladder prediction methods, counterparts of them that we designed to construct quality ladders, a fixed bitrate ladder, and bitrate ladders constructed via exhaustive encoding using Bjontegaard delta metrics.
Abstract:Recently proposed perceptually optimized per-title video encoding methods provide better BD-rate savings than fixed bitrate-ladder approaches that have been employed in the past. However, a disadvantage of per-title encoding is that it requires significant time and energy to compute bitrate ladders. Over the past few years, a variety of methods have been proposed to construct optimal bitrate ladders including using low-level features to predict cross-over bitrates, optimal resolutions for each bitrate, predicting visual quality, etc. Here, we deploy features drawn from Visual Information Fidelity (VIF) (VIF features) extracted from uncompressed videos to predict the visual quality (VMAF) of compressed videos. We present multiple VIF feature sets extracted from different scales and subbands of a video to tackle the problem of bitrate ladder construction. Comparisons are made against a fixed bitrate ladder and a bitrate ladder obtained from exhaustive encoding using Bjontegaard delta metrics.