Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jincheng Dai

NeRFCom: Feature Transform Coding Meets Neural Radiance Field for Free-View 3D Scene Semantic Transmission

Feb 27, 2025

Weijie Yue, Zhongwei Si, Bolin Wu, Sixian Wang, Xiaoqi Qin, Kai Niu, Jincheng Dai, Ping Zhang

Abstract:We introduce NeRFCom, a novel communication system designed for end-to-end 3D scene transmission. Compared to traditional systems relying on handcrafted NeRF semantic feature decomposition for compression and well-adaptive channel coding for transmission error correction, our NeRFCom employs a nonlinear transform and learned probabilistic models, enabling flexible variable-rate joint source-channel coding and efficient bandwidth allocation aligned with the NeRF semantic feature's different contribution to the 3D scene synthesis fidelity. Experimental results demonstrate that NeRFCom achieves free-view 3D scene efficient transmission while maintaining robustness under adverse channel conditions.

Via

Access Paper or Ask Questions

Rate-Distortion-Perception Controllable Joint Source-Channel Coding for High-Fidelity Generative Communications

Aug 26, 2024

Kailin Tan, Jincheng Dai, Zhenyu Liu, Sixian Wang, Xiaoqi Qin, Wenjun Xu, Kai Niu, Ping Zhang

Abstract:End-to-end image transmission has recently become a crucial trend in intelligent wireless communications, driven by the increasing demand for high bandwidth efficiency. However, existing methods primarily optimize the trade-off between bandwidth cost and objective distortion, often failing to deliver visually pleasing results aligned with human perception. In this paper, we propose a novel rate-distortion-perception (RDP) jointly optimized joint source-channel coding (JSCC) framework to enhance perception quality in human communications. Our RDP-JSCC framework integrates a flexible plug-in conditional Generative Adversarial Networks (GANs) to provide detailed and realistic image reconstructions at the receiver, overcoming the limitations of traditional rate-distortion optimized solutions that typically produce blurry or poorly textured images. Based on this framework, we introduce a distortion-perception controllable transmission (DPCT) model, which addresses the variation in the perception-distortion trade-off. DPCT uses a lightweight spatial realism embedding module (SREM) to condition the generator on a realism map, enabling the customization of appearance realism for each image region at the receiver from a single transmission. Furthermore, for scenarios with scarce bandwidth, we propose an interest-oriented content-controllable transmission (CCT) model. CCT prioritizes the transmission of regions that attract user attention and generates other regions from an instance label map, ensuring both content consistency and appearance realism for all regions while proportionally reducing channel bandwidth costs. Comprehensive experiments demonstrate the superiority of our RDP-optimized image transmission framework over state-of-the-art engineered image transmission systems and advanced perceptual methods.

Via

Access Paper or Ask Questions

DiffCom: Channel Received Signal is a Natural Condition to Guide Diffusion Posterior Sampling

Jun 11, 2024

Sixian Wang, Jincheng Dai, Kailin Tan, Xiaoqi Qin, Kai Niu, Ping Zhang

Figure 1 for DiffCom: Channel Received Signal is a Natural Condition to Guide Diffusion Posterior Sampling

Figure 2 for DiffCom: Channel Received Signal is a Natural Condition to Guide Diffusion Posterior Sampling

Figure 3 for DiffCom: Channel Received Signal is a Natural Condition to Guide Diffusion Posterior Sampling

Figure 4 for DiffCom: Channel Received Signal is a Natural Condition to Guide Diffusion Posterior Sampling

Abstract:End-to-end visual communication systems typically optimize a trade-off between channel bandwidth costs and signal-level distortion metrics. However, under challenging physical conditions, this traditional discriminative communication paradigm often results in unrealistic reconstructions with perceptible blurring and aliasing artifacts, despite the inclusion of perceptual or adversarial losses for optimizing. This issue primarily stems from the receiver's limited knowledge about the underlying data manifold and the use of deterministic decoding mechanisms. To address these limitations, this paper introduces DiffCom, a novel end-to-end generative communication paradigm that utilizes off-the-shelf generative priors and probabilistic diffusion models for decoding, thereby improving perceptual quality without heavily relying on bandwidth costs and received signal quality. Unlike traditional systems that rely on deterministic decoders optimized solely for distortion metrics, our DiffCom leverages raw channel-received signal as a fine-grained condition to guide stochastic posterior sampling. Our approach ensures that reconstructions remain on the manifold of real data with a novel confirming constraint, enhancing the robustness and reliability of the generated outcomes. Furthermore, DiffCom incorporates a blind posterior sampling technique to address scenarios with unknown forward transmission characteristics. Extensive experimental validations demonstrate that DiffCom not only produces realistic reconstructions with details faithful to the original data but also achieves superior robustness against diverse wireless transmission degradations. Collectively, these advancements establish DiffCom as a new benchmark in designing generative communication systems that offer enhanced robustness and generalization superiorities.

Via

Access Paper or Ask Questions

Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency

Jun 10, 2024

Jincheng Dai, Xiaoqi Qin, Sixian Wang, Lexi Xu, Kai Niu, Ping Zhang

Figure 1 for Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency

Figure 2 for Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency

Figure 3 for Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency

Figure 4 for Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency

Abstract:Information theory and machine learning are inextricably linked and have even been referred to as "two sides of the same coin". One particularly elegant connection is the essential equivalence between probabilistic generative modeling and data compression or transmission. In this article, we reveal the dual-functionality of deep generative models that reshapes both data compression for efficiency and transmission error concealment for resiliency. We present how the contextual predictive capabilities of powerful generative models can be well positioned to be strong compressors and estimators. In this sense, we advocate for viewing the deep generative modeling problem through the lens of end-to-end communications, and evaluate the compression and error restoration capabilities of foundation generative models. We show that the kernel of many large generative models is powerful predictor that can capture complex relationships among semantic latent variables, and the communication viewpoints provide novel insights into semantic feature tokenization, contextual learning, and usage of deep generative models. In summary, our article highlights the essential connections of generative AI to source and channel coding techniques, and motivates researchers to make further explorations in this emerging topic.

* Publication in IEEE Wireless Communications

Via

Access Paper or Ask Questions

Sequence can Secretly Tell You What to Discard

Apr 24, 2024

Jincheng Dai, Zhuowei Huang, Haiyun Jiang, Chen Chen, Deng Cai, Wei Bi, Shuming Shi

Abstract:Large Language Models (LLMs), despite their impressive performance on a wide range of tasks, require significant GPU memory and consume substantial computational resources. In addition to model weights, the memory occupied by KV cache increases linearly with sequence length, becoming a main bottleneck for inference. In this paper, we introduce a novel approach for optimizing the KV cache which significantly reduces its memory footprint. Through a comprehensive investigation, we find that on LLaMA2 series models, (i) the similarity between adjacent tokens' query vectors is remarkably high, and (ii) current query's attention calculation can rely solely on the attention information of a small portion of the preceding queries. Based on these observations, we propose CORM, a KV cache eviction policy that dynamically retains important key-value pairs for inference without finetuning the model. We validate that CORM reduces the inference memory usage of KV cache by up to 70% without noticeable performance degradation across six tasks in LongBench.

Via

Access Paper or Ask Questions

Semantics-Division Duplexing: A Novel Full-Duplex Paradigm

Dec 14, 2023

Kai Niu, Zijian Liang, Chao Dong, Jincheng Dai, Zhongwei Si, Ping Zhang

Abstract:In-band full-duplex (IBFD) is a theoretically effective solution to increase the overall throughput for the future wireless communications system by enabling transmission and reception over the same time-frequency resources. However, reliable source reconstruction remains a great challenge in the practical IBFD systems due to the non-ideal elimination of the self-interference and the inherent limitations of the separate source and channel coding methods. On the other hand, artificial intelligence-enabled semantic communication can provide a viable direction for the optimization of the IBFD system. This article introduces a novel IBFD paradigm with the guidance of semantic communication called semantics-division duplexing (SDD). It utilizes semantic domain processing to further suppress self-interference, distinguish the expected semantic information, and recover the desired sources. Further integration of the digital and semantic domain processing can be implemented so as to achieve intelligent and concise communications. We present the advantages of the SDD paradigm with theoretical explanations and provide some visualized results to verify its effectiveness.

* 9 pages, 5 figures, submitted to IEEE Wireless Communications Magazine

Via

Access Paper or Ask Questions

SwinJSCC: Taming Swin Transformer for Deep Joint Source-Channel Coding

Aug 18, 2023

Ke Yang, Sixian Wang, Jincheng Dai, Xiaoqi Qin, Kai Niu, Ping Zhang

Figure 1 for SwinJSCC: Taming Swin Transformer for Deep Joint Source-Channel Coding

Figure 2 for SwinJSCC: Taming Swin Transformer for Deep Joint Source-Channel Coding

Figure 3 for SwinJSCC: Taming Swin Transformer for Deep Joint Source-Channel Coding

Figure 4 for SwinJSCC: Taming Swin Transformer for Deep Joint Source-Channel Coding

Abstract:As one of the key techniques to realize semantic communications, end-to-end optimized neural joint source-channel coding (JSCC) has made great progress over the past few years. A general trend in many recent works pushing the model adaptability or the application diversity of neural JSCC is based on the convolutional neural network (CNN) backbone, whose model capacity is yet limited, inherently leading to inferior system coding gain against traditional coded transmission systems. In this paper, we establish a new neural JSCC backbone that can also adapt flexibly to diverse channel conditions and transmission rates within a single model, our open-source project aims to promote the research in this field. Specifically, we show that with elaborate design, neural JSCC codec built on the emerging Swin Transformer backbone achieves superior performance than conventional neural JSCC codecs built upon CNN, while also requiring lower end-to-end processing latency. Paired with two spatial modulation modules that scale latent representations based on the channel state information and target transmission rate, our baseline SwinJSCC can further upgrade to a versatile version, which increases its capability to adapt to diverse channel conditions and rate configurations. Extensive experimental results show that our SwinJSCC achieves better or comparable performance versus the state-of-the-art engineered BPG + 5G LDPC coded transmission system with much faster end-to-end coding speed, especially for high-resolution images, in which case traditional CNN-based JSCC yet falls behind due to its limited model capacity. \emph{Our open-source code and model are available at \href{https://github.com/semcomm/SwinJSCC}{https://github.com/semcomm/SwinJSCC}.}

Via

Access Paper or Ask Questions

Toward Intelligent and Efficient 6G Networks: JCSC Enabled On-Purpose Machine Communications

Jun 30, 2023

Ping Zhang, Heng Yang, Zhiyong Feng, Yanpeng Cui, Jincheng Dai, Xiaoqi Qin, Jinglin Li, Qixun Zhang

Abstract:Driven by the vision of "intelligent connection of everything" toward 6G, the collective intelligence of networked machines can be fully exploited to improve system efficiency by shifting the paradigm of wireless communication design from naive maximalist approaches to intelligent value-based approaches. In this article, we propose an on-purpose machine communication framework enabled by joint communication, sensing, and computation (JCSC) technology, which employs machine semantics as the interactive information flow. Naturally, there are potential technical barriers to be solved before the widespread adoption of on-purpose communications, including the conception of machine purpose, fast and concise networking strategy, and semantics-aware information exchange mechanism during the process of task-oriented cooperation. Hence, we discuss enabling technologies complemented by a range of open challenges. The simulation result shows that the proposed framework can significantly reduce networking overhead and improve communication efficiency.

* in IEEE Wireless Communications, vol. 30, no. 1, pp. 150-157, February 2023
* 8 pages, 6 figures

Via

Access Paper or Ask Questions

Dimensions of Semantic Coding: Explicit and Implicit

Mar 26, 2023

Jincheng Dai, Sixian Wang, Ping Zhang, Xiaoqi Qin, Kai Niu, Zhongwei Si

Abstract:Recent advances in deep learning have led to increased interest in solving high-efficiency end-to-end transmission problems using methods that employ the nonlinear property of neural networks. These methods, we call semantic coding, extract semantic features of the source signal across space and time, and design source-channel coding methods to transmit these features over wireless channels. Rapid progress has led to numerous research papers, but a consolidation of the discovered knowledge has not yet emerged. In this article, we gather ideas to categorize the expansive aspects on semantic coding as two paradigms, i.e., explicit and implicit semantic coding. We first focus on those two paradigms of semantic coding by identifying their common and different components in building semantic communication systems. We then focus on the applications of semantic coding to different transmission tasks. Our article highlights the improved quality, flexibility, and capability brought by semantic coded transmission. Finally, we point out future directions.

Via

Access Paper or Ask Questions

Improved Nonlinear Transform Source-Channel Coding to Catalyze Semantic Communications

Mar 26, 2023

Sixian Wang, Jincheng Dai, Xiaoqi Qin, Zhongwei Si, Kai Niu, Ping Zhang

Figure 1 for Improved Nonlinear Transform Source-Channel Coding to Catalyze Semantic Communications

Figure 2 for Improved Nonlinear Transform Source-Channel Coding to Catalyze Semantic Communications

Figure 3 for Improved Nonlinear Transform Source-Channel Coding to Catalyze Semantic Communications

Figure 4 for Improved Nonlinear Transform Source-Channel Coding to Catalyze Semantic Communications

Abstract:Recent deep learning methods have led to increased interest in solving high-efficiency end-to-end transmission problems. These methods, we call nonlinear transform source-channel coding (NTSCC), extract the semantic latent features of source signal, and learn entropy model to guide the joint source-channel coding with variable rate to transmit latent features over wireless channels. In this paper, we propose a comprehensive framework for improving NTSCC, thereby higher system coding gain, better model versatility, and more flexible adaptation strategy aligned with semantic guidance are all achieved. This new sophisticated NTSCC model is now ready to support large-size data interaction in emerging XR, which catalyzes the application of semantic communications. Specifically, we propose three useful improvement approaches. First, we introduce a contextual entropy model to better capture the spatial correlations among the semantic latent features, thereby more accurate rate allocation and contextual joint source-channel coding are developed accordingly to enable higher coding gain. On that basis, we further propose response network architectures to formulate versatile NTSCC, i.e., once-trained model supports various rates and channel states that benefits the practical deployment. Following this, we propose an online latent feature editing method to enable more flexible coding rate control aligned with some specific semantic guidance. By comprehensively applying the above three improvement methods for NTSCC, a deployment-friendly semantic coded transmission system stands out finally. Our improved NTSCC system has been experimentally verified to achieve 16.35% channel bandwidth saving versus the state-of-the-art engineered VTM + 5G LDPC coded transmission system with lower processing latency.

Via

Access Paper or Ask Questions