Abstract:In this paper, we propose a cross-layer encrypted semantic communication (CLESC) framework for panoramic video transmission, incorporating feature extraction, encoding, encryption, cyclic redundancy check (CRC), and retransmission processes to achieve compatibility between semantic communication and traditional communication systems. Additionally, we propose an adaptive cross-layer transmission mechanism that dynamically adjusts CRC, channel coding, and retransmission schemes based on the importance of semantic information. This ensures that important information is prioritized under poor transmission conditions. To verify the aforementioned framework, we also design an end-to-end adaptive panoramic video semantic transmission (APVST) network that leverages a deep joint source-channel coding (Deep JSCC) structure and attention mechanism, integrated with a latitude adaptive module that facilitates adaptive semantic feature extraction and variable-length encoding of panoramic videos. The proposed CLESC is also applicable to the transmission of other modal data. Simulation results demonstrate that the proposed CLESC effectively achieves compatibility and adaptation between semantic communication and traditional communication systems, improving both transmission efficiency and channel adaptability. Compared to traditional cross-layer transmission schemes, the CLESC framework can reduce bandwidth consumption by 85% while showing significant advantages under low signal-to-noise ratio (SNR) conditions.
Abstract:In this paper, we propose an adaptive panoramic video semantic transmission (APVST) network built on the deep joint source-channel coding (Deep JSCC) structure for the efficient end-to-end transmission of panoramic videos. The proposed APVST network can adaptively extract semantic features of panoramic frames and achieve semantic feature encoding. To achieve high spectral efficiency and save bandwidth, we propose a transmission rate control mechanism for the APVST via the entropy model and the latitude adaptive model. Besides, we take weighted-to-spherically-uniform peak signal-to-noise ratio (WS-PSNR) and weighted-to-spherically-uniform structural similarity (WS-SSIM) as distortion evaluation metrics, and propose the weight attention module to fuse the weights with the semantic features to achieve better quality of immersive experiences. Finally, we evaluate our proposed scheme on a panoramic video dataset containing 208 panoramic videos. The simulation results show that the APVST can save up to 20% and 50% on channel bandwidth cost compared with other semantic communication-based and traditional video transmission schemes.