Abstract:Joint source-channel coding systems based on deep neural networks (DeepJSCC) have recently demonstrated remarkable performance in wireless image transmission. Existing methods primarily focus on minimizing distortion between the transmitted image and the reconstructed version at the receiver, often overlooking perceptual quality. This can lead to severe perceptual degradation when transmitting images under extreme conditions, such as low bandwidth compression ratios (BCRs) and low signal-to-noise ratios (SNRs). In this work, we propose SING, a novel two-stage JSCC framework that formulates the recovery of high-quality source images from corrupted reconstructions as an inverse problem. Depending on the availability of information about the DeepJSCC encoder/decoder and the channel at the receiver, SING can either approximate the stochastic degradation as a linear transformation, or leverage invertible neural networks (INNs) for precise modeling. Both approaches enable the seamless integration of diffusion models into the reconstruction process, enhancing perceptual quality. Experimental results demonstrate that SING outperforms DeepJSCC and other approaches, delivering superior perceptual quality even under extremely challenging conditions, including scenarios with significant distribution mismatches between the training and test data.
Abstract:Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deployments. Multi-modal foundation models provide a promising solution by generating universal semantic tokens. Inspired by this, we introduce SemCLIP, a task-agnostic SemCom framework leveraging the contrastive language-image pre-training (CLIP) model. By transmitting CLIP-generated image tokens instead of raw images, SemCLIP enables efficient semantic communications under low bandwidth and challenging channel conditions, facilitating diverse downstream tasks and zero-shot applications. Specifically, we propose a DeepJSCC scheme for efficient CLIP tokens encoding. To mitigate potential degradation caused by compression and channel noise, a multi-modal transmission-aware prompt learning mechanism is designed at the receiver, which adapts prompts based on transmission quality, enhancing system robustness and channel adaptability. Simulation results demonstrate that SemCLIP outperforms the baselines, achieving a $41\%$ improvement in zero-shot accuracy at a low signal-to-noise ratio. Meanwhile, SemCLIP reduces bandwidth usage by more than $50$-fold compared to different image transmission methods, demonstrating the potential of foundation models towards a generalized, task-agnostic SemCom solution.
Abstract:Token communications is an emerging generative semantic communication concept that reduces transmission rates by using context and transformer-based token processing, with tokens serving as universal semantic units. In this paper, we propose a semantic multiple access scheme in the token domain, referred to as ToDMA, where a large number of devices share a tokenizer and a modulation codebook for source and channel coding, respectively. Specifically, the source signal is tokenized into sequences, with each token modulated into a codeword. Codewords from multiple devices are transmitted simultaneously, resulting in overlap at the receiver. The receiver detects the transmitted tokens, assigns them to their respective sources, and mitigates token collisions by leveraging context and semantic orthogonality across the devices' messages. Simulations demonstrate that the proposed ToDMA framework outperforms context-unaware orthogonal and non-orthogonal communication methods in image transmission tasks, achieving lower latency and better image quality.
Abstract:Efficient channel state information (CSI) compression is crucial in frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems due to excessive feedback overhead. Recently, deep learning-based compression techniques have demonstrated superior performance across various data types, including CSI. However, these approaches often experience performance degradation when the data distribution changes due to their limited generalization capabilities. To address this challenge, we propose a model fine-tuning approach for CSI feedback in massive MIMO systems. The idea is to fine-tune the encoder/decoder network models in a dynamic fashion using the recent CSI samples. First, we explore encoder-only fine-tuning, where only the encoder parameters are updated, leaving the decoder and latent parameters unchanged. Next, we consider full-model fine-tuning, where the encoder and decoder models are jointly updated. Unlike encoder-only fine-tuning, full-model fine-tuning requires the updated decoder and latent parameters to be transmitted to the decoder side. To efficiently handle this, we propose different prior distributions for model updates, such as uniform and truncated Gaussian to entropy code them together with the compressed CSI and account for additional feedback overhead imposed by conveying the model updates. Moreover, we incorporate quantized model updates during fine-tuning to reflect the impact of quantization in the deployment phase. Our results demonstrate that full-model fine-tuning significantly enhances the rate-distortion (RD) performance of neural CSI compression. Furthermore, we analyze how often the full-model fine-tuning should be applied in a new wireless environment and identify an optimal period interval for achieving the best RD trade-off.
Abstract:Joint source-channel coding (JSCC) offers a promising avenue for enhancing transmission efficiency by jointly incorporating source and channel statistics into the system design. A key advancement in this area is the deep joint source and channel coding (DeepJSCC) technique that designs a direct mapping of input signals to channel symbols parameterized by a neural network, which can be trained for arbitrary channel models and semantic quality metrics. This paper advances the DeepJSCC framework toward a semantics-aligned, high-fidelity transmission approach, called semantics-guided diffusion DeepJSCC (SGD-JSCC). Existing schemes that integrate diffusion models (DMs) with JSCC face challenges in transforming random generation into accurate reconstruction and adapting to varying channel conditions. SGD-JSCC incorporates two key innovations: (1) utilizing some inherent information that contributes to the semantics of an image, such as text description or edge map, to guide the diffusion denoising process; and (2) enabling seamless adaptability to varying channel conditions with the help of a semantics-guided DM for channel denoising. The DM is guided by diverse semantic information and integrates seamlessly with DeepJSCC. In a slow fading channel, SGD-JSCC dynamically adapts to the instantaneous signal-to-noise ratio (SNR) directly estimated from the channel output, thereby eliminating the need for additional pilot transmissions for channel estimation. In a fast fading channel, we introduce a training-free denoising strategy, allowing SGD-JSCC to effectively adjust to fluctuations in channel gains. Numerical results demonstrate that, guided by semantic information and leveraging the powerful DM, our method outperforms existing DeepJSCC schemes, delivering satisfactory reconstruction performance even at extremely poor channel conditions.
Abstract:Deep neural network (DNN)-based joint source and channel coding is proposed for end-to-end secure image transmission against multiple eavesdroppers. Both scenarios of colluding and non-colluding eavesdroppers are considered. Instead of idealistic assumptions of perfectly known and i.i.d. source and channel distributions, the proposed scheme assumes unknown source and channel statistics. The goal is to transmit images with minimum distortion, while simultaneously preventing eavesdroppers from inferring private attributes of images. Simultaneously generalizing the ideas of privacy funnel and wiretap coding, a multi-objective optimization framework is expressed that characterizes the trade-off between image reconstruction quality and information leakage to eavesdroppers, taking into account the structural similarity index (SSIM) for improving the perceptual quality of image reconstruction. Extensive experiments over CIFAR-10 and CelebFaces Attributes (CelebA) datasets, together with ablation studies are provided to highlight the performance gain in terms of SSIM, adversarial accuracy, and cross-entropy metric compared with benchmarks. Experiments show that the proposed scheme restrains the adversarially-trained eavesdroppers from intercepting privatized data for both cases of eavesdropping a common secret, as well as the case in which eavesdroppers are interested in different secrets. Furthermore, useful insights on the privacy-utility trade-off are also provided.
Abstract:The growing demand for intelligent applications beyond the network edge, coupled with the need for sustainable operation, are driving the seamless integration of deep learning (DL) algorithms into energy-limited, and even energy-harvesting end-devices. However, the stochastic nature of ambient energy sources often results in insufficient harvesting rates, failing to meet the energy requirements for inference and causing significant performance degradation in energy-agnostic systems. To address this problem, we consider an on-device adaptive inference system equipped with an energy-harvester and finite-capacity energy storage. We then allow the device to reduce the run-time execution cost on-demand, by either switching between differently-sized neural networks, referred to as multi-model selection (MMS), or by enabling earlier predictions at intermediate layers, called early exiting (EE). The model to be employed, or the exit point is then dynamically chosen based on the energy storage and harvesting process states. We also study the efficacy of integrating the prediction confidence into the decision-making process. We derive a principled policy with theoretical guarantees for confidence-aware and -agnostic controllers. Moreover, in multi-exit networks, we study the advantages of taking decisions incrementally, exit-by-exit, by designing a lightweight reinforcement learning-based controller. Experimental results show that, as the rate of the ambient energy increases, energy- and confidence-aware control schemes show approximately 5% improvement in accuracy compared to their energy-aware confidence-agnostic counterparts. Incremental approaches achieve even higher accuracy, particularly when the energy storage capacity is limited relative to the energy consumption of the inference model.
Abstract:Collaborative perception (CP) is emerging as a promising solution to the inherent limitations of stand-alone intelligence. However, current wireless communication systems are unable to support feature-level and raw-level collaborative algorithms due to their enormous bandwidth demands. In this paper, we propose DiffCP, a novel CP paradigm that utilizes a specialized diffusion model to efficiently compress the sensing information of collaborators. By incorporating both geometric and semantic conditions into the generative model, DiffCP enables feature-level collaboration with an ultra-low communication cost, advancing the practical implementation of CP systems. This paradigm can be seamlessly integrated into existing CP algorithms to enhance a wide range of downstream tasks. Through extensive experimentation, we investigate the trade-offs between communication, computation, and performance. Numerical results demonstrate that DiffCP can significantly reduce communication costs by 14.5-fold while maintaining the same performance as the state-of-the-art algorithm.
Abstract:Semantic- and task-oriented communication has emerged as a promising approach to reducing the latency and bandwidth requirements of next-generation mobile networks by transmitting only the most relevant information needed to complete a specific task at the receiver. This is particularly advantageous for machine-oriented communication of high data rate content, such as images and videos, where the goal is rapid and accurate inference, rather than perfect signal reconstruction. While semantic- and task-oriented compression can be implemented in conventional communication systems, joint source-channel coding (JSCC) offers an alternative end-to-end approach by optimizing compression and channel coding together, or even directly mapping the source signal to the modulated waveform. Although all digital communication systems today rely on separation, thanks to its modularity, JSCC is known to achieve higher performance in finite blocklength scenarios, and to avoid cliff and the levelling-off effects in time-varying channel scenarios. This article provides an overview of the information theoretic foundations of JSCC, surveys practical JSCC designs over the decades, and discusses the reasons for their limited adoption in practical systems. We then examine the recent resurgence of JSCC, driven by the integration of deep learning techniques, particularly through DeepJSCC, highlighting its many surprising advantages in various scenarios. Finally, we discuss why it may be time to reconsider today's strictly separate architectures, and reintroduce JSCC to enable high-fidelity, low-latency communications in critical applications such as autonomous driving, drone surveillance, or wearable systems.
Abstract:In this work, we propose a Gaussian mixture model (GMM)-based pilot design scheme for downlink (DL) channel estimation in single- and multi-user multiple-input multiple-output (MIMO) frequency division duplex (FDD) systems. In an initial offline phase, the GMM captures prior information during training, which is then utilized for pilot design. In the single-user case, the GMM is utilized to construct a codebook of pilot matrices and, once shared with the mobile terminal (MT), can be employed to determine a feedback index at the MT. This index selects a pilot matrix from the constructed codebook, eliminating the need for online pilot optimization. We further establish a sum conditional mutual information (CMI)-based pilot optimization framework for multi-user MIMO (MU-MIMO) systems. Based on the established framework, we utilize the GMM for pilot matrix design in MU-MIMO systems. The analytic representation of the GMM enables the adaptation to any signal-to-noise ratio (SNR) level and pilot configuration without re-training. Additionally, an adaption to any number of MTs is facilitated. Extensive simulations demonstrate the superior performance of the proposed pilot design scheme compared to state-of-the-art approaches. The performance gains can be exploited, e.g., to deploy systems with fewer pilots.