Abstract:Volumetric video streaming offers immersive 3D experiences but faces significant challenges due to high bandwidth requirements and latency issues in transmitting detailed content in real time. Traditional methods like point cloud streaming compromise visual quality when zoomed in, and neural rendering techniques are too computationally intensive for real-time use. Though mesh-based streaming stands out by preserving surface detail and connectivity, offering a more refined representation for 3D content, traditional mesh streaming methods typically transmit data on a per-frame basis, failing to take full advantage of temporal redundancies across frames. This results in inefficient bandwidth usage and poor adaptability to fluctuating network conditions. We introduce Deformation-based Adaptive Volumetric Video Streaming, a novel framework that enhances volumetric video streaming performance by leveraging the inherent deformability of mesh-based representations. DeformStream uses embedded deformation to reconstruct subsequent frames from inter-frame motion, significantly reducing bandwidth usage while ensuring visual coherence between frames. To address frame reconstruction overhead and network adaptability, we formulate a new QoE model that accounts for client-side deformation latency and design a dynamic programming algorithm to optimize the trade-off between visual quality and bandwidth consumption under varying network conditions. Our evaluation demonstrates that Deformation-based Adaptive Volumetric Video Streaming outperforms existing mesh-based streaming systems in both bandwidth efficiency and visual quality, offering a robust solution for real-time volumetric video applications.
Abstract:The high-accuracy and resource-intensive deep neural networks (DNNs) have been widely adopted by live video analytics (VA), where camera videos are streamed over the network to resource-rich edge/cloud servers for DNN inference. Common video encoding configurations (e.g., resolution and frame rate) have been identified with significant impacts on striking the balance between bandwidth consumption and inference accuracy and therefore their adaption scheme has been a focus of optimization. However, previous profiling-based solutions suffer from high profiling cost, while existing deep reinforcement learning (DRL) based solutions may achieve poor performance due to the usage of fixed reward function for training the agent, which fails to craft the application goals in various scenarios. In this paper, we propose ILCAS, the first imitation learning (IL) based configuration-adaptive VA streaming system. Unlike DRL-based solutions, ILCAS trains the agent with demonstrations collected from the expert which is designed as an offline optimal policy that solves the configuration adaption problem through dynamic programming. To tackle the challenge of video content dynamics, ILCAS derives motion feature maps based on motion vectors which allow ILCAS to visually ``perceive'' video content changes. Moreover, ILCAS incorporates a cross-camera collaboration scheme to exploit the spatio-temporal correlations of cameras for more proper configuration selection. Extensive experiments confirm the superiority of ILCAS compared with state-of-the-art solutions, with 2-20.9% improvement of mean accuracy and 19.9-85.3% reduction of chunk upload lag.