Abstract:Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deployments. Multi-modal foundation models provide a promising solution by generating universal semantic tokens. Inspired by this, we introduce SemCLIP, a task-agnostic SemCom framework leveraging the contrastive language-image pre-training (CLIP) model. By transmitting CLIP-generated image tokens instead of raw images, SemCLIP enables efficient semantic communications under low bandwidth and challenging channel conditions, facilitating diverse downstream tasks and zero-shot applications. Specifically, we propose a DeepJSCC scheme for efficient CLIP tokens encoding. To mitigate potential degradation caused by compression and channel noise, a multi-modal transmission-aware prompt learning mechanism is designed at the receiver, which adapts prompts based on transmission quality, enhancing system robustness and channel adaptability. Simulation results demonstrate that SemCLIP outperforms the baselines, achieving a $41\%$ improvement in zero-shot accuracy at a low signal-to-noise ratio. Meanwhile, SemCLIP reduces bandwidth usage by more than $50$-fold compared to different image transmission methods, demonstrating the potential of foundation models towards a generalized, task-agnostic SemCom solution.
Abstract:Intelligent task-oriented semantic communications (SemComs) have witnessed great progress with the development of deep learning (DL). In this paper, we propose a semantic-aware hybrid automatic repeat request (SemHARQ) framework for the robust and efficient transmissions of semantic features. First, to improve the robustness and effectiveness of semantic coding, a multi-task semantic encoder is proposed. Meanwhile, a feature importance ranking (FIR) method is investigated to ensure the important features delivery under limited channel resources. Then, to accurately detect the possible transmission errors, a novel feature distortion evaluation (FDE) network is designed to identify the distortion level of each feature, based on which an efficient HARQ method is proposed. Specifically, the corrupted features are retransmitted, where the remaining channel resources are used for incremental transmissions. The system performance is evaluated under different channel conditions in multi-task scenarios in Internet of Vehicles. Extensive experiments show that the proposed framework outperforms state-of-the-art works by more than 20% in rank-1 accuracy for vehicle re-identification, and 10% in vehicle color classification accuracy in the low signal-to-noise ratio regime.
Abstract:Semantic communications are expected to be an innovative solution to the emerging intelligent applications in the era of connected intelligence. In this paper, a novel scalable multitask semantic communication system with feature importance ranking (SMSC-FIR) is explored. Firstly, the multi-task correlations are investigated by a joint semantic encoder to extract relevant features. Then, a new scalable coding method is proposed based on feature importance ranking, which dynamically adjusts the coding rate and guarantees that important features for semantic tasks are transmitted with higher priority. Simulation results show that SMSC-FIR achieves performance gain w.r.t. individual intelligent tasks, especially in the low SNR regime.