Abstract:The stability and reliability of wireless data transmission in vehicular networks face significant challenges due to the high dynamics of path loss caused by the complexity of rapidly changing environments. This paper proposes a multi-modal environmental sensing-based path loss prediction architecture (MES-PLA) for V2I communications. First, we establish a multi-modal environment data and channel joint acquisition platform to generate a spatio-temporally synchronized and aligned dataset of environmental and channel data. Then we designed a multi-modal feature extraction and fusion network (MFEF-Net) for multi-modal environmental sensing data. MFEF-Net extracts features from RGB images, point cloud data, and GPS information, and integrates them with an attention mechanism to effectively leverage the strengths of each modality. The simulation results demonstrate that the Root Mean Square Error (RMSE) of MES-PLA is 2.20 dB, indicating a notable improvement in prediction accuracy compared to single-modal sensing data input. Moreover, MES-PLA exhibits enhanced stability under varying illumination conditions compared to single-modal methods.
Abstract:As 6G research advances, the growing demand leads to the emergence of novel technologies such as Integrated Sensing and Communication (ISAC), new antenna arrays like Extremely Large MIMO (XL-MIMO) and Reconfigurable Intelligent Surfaces (RIS), along with multi-frequency bands (6-24 GHz, above 100 GHz). Standardized unified channel models are crucial for research and performance evaluation across generations of mobile communication, but the existing 5G 3GPP channel model based on geometry-based stochastic model (GBSM) requires further extension to accommodate these 6G technologies. In response to this need, this article first investigates six distinctive channel characteristics introduced by 6G techenologies, such as ISAC target RCS, sparsity in the new mid-band, and others. Subsequently, an extended GBSM (E-GBSM) is proposed, integrating these characteristics into a unified modeling framework. The proposed model not only accommodates 6G technologies with flexibility but also maintains backward compatibility with 5G, ensuring a smooth evolution between generations. Finally, the implementation process of the proposed model is detailed, with experiments and simulations validate its effectiveness and accuracy, providing support for 6G channel modeling standardization efforts.
Abstract:Terahertz (THz) integrated sensing and communication (ISAC) holds the potential to achieve high data rates and high-resolution sensing. Reconstructing the propagation environment is a vital step for THz ISAC, as it enhances the predictability of the communication channel to reduce communication overhead. In this letter, we propose an environment reconstruction methodology (ERM) merging reflectors of multi-targets based on THz single-sided channel small-scale characteristics. In this method, the inclination and position of tiny reflection faces of one single multi-path (MPC) are initially detected by double-triangle equations based on Snells law and geometry properties. Then, those reflection faces of multi-target MPCs, which are filtrated as available and one-order reflection MPCs, are globally merged to accurately reconstruct the entire propagation environment. The ERM is capable of operating with only small-scale parameters of receiving MPC. Subsequently, we validate our ERM through two experiments: bi-static ray-tracing simulations in an L-shaped room and channel measurements in an urban macrocellular (UMa) scenario in THz bands. The validation results demonstrate a small deviation of 0.03 m between the sensing outcomes and the predefined reflectors in the ray-tracing simulation and a small sensing root-mean-square error of 1.28 m and 0.45 m in line-of-sight and non-line-of-sight cases respectively based on channel measurements. Overall, this work is valuable for designing THz communication systems and facilitating the application of THz ISAC communication techniques.
Abstract:6G is envisaged to provide multimodal sensing, pervasive intelligence, global coverage, global coverage, etc., which poses extreme intricacy and new challenges to the network design and optimization. As the core part of 6G, wireless channel is the carrier and enabler for the flourishing technologies and novel services, which intrinsically determines the ultimate system performance. However, how to describe and utilize the complicated and high-dynamic characteristics of wireless channel accurately and effectively still remains great hallenges. To tackle this, digital twin is envisioned as a powerful technology to migrate the physical entities to virtual and computational world. In this article, we propose a large model driven digital twin channel generator (ChannelGPT) embedded with environment intelligence (EI) to enable pervasive intelligence paradigm for 6G network. EI is an iterative and interactive procedure to boost the system performance with online environment adaptivity. Firstly, ChannelGPT is capable of utilization the multimodal data from wireless channel and corresponding physical environment with the equipped sensing ability. Then, based on the fine-tuned large model, ChannelGPT can generate multi-scenario channel parameters, associated map information and wireless knowledge simultaneously, in terms of each task requirement. Furthermore, with the support of online multidimensional channel and environment information, the network entity will make accurate and immediate decisions for each 6G system layer. In practice, we also establish a ChannelGPT prototype to generate high-fidelity channel data for varied scenarios to validate the accuracy and generalization ability based on environment intelligence.
Abstract:In federated learning, the heterogeneity of client data has a great impact on the performance of model training. Many heterogeneity issues in this process are raised by non-independently and identically distributed (Non-IID) data. This study focuses on the issue of label distribution skew. To address it, we propose a hybrid federated learning framework called HFLDD, which integrates dataset distillation to generate approximately independent and equally distributed (IID) data, thereby improving the performance of model training. Particularly, we partition the clients into heterogeneous clusters, where the data labels among different clients within a cluster are unbalanced while the data labels among different clusters are balanced. The cluster headers collect distilled data from the corresponding cluster members, and conduct model training in collaboration with the server. This training process is like traditional federated learning on IID data, and hence effectively alleviates the impact of Non-IID data on model training. Furthermore, we compare our proposed method with typical baseline methods on public datasets. Experimental results demonstrate that when the data labels are severely imbalanced, the proposed HFLDD outperforms the baseline methods in terms of both test accuracy and communication cost.
Abstract:Channel state information (CSI) is crucial for massive multi-input multi-output (MIMO) system. As the antenna scale increases, acquiring CSI results in significantly higher system overhead. In this letter, we propose a novel channel prediction method which utilizes wireless environmental information with pilot pattern optimization for CSI prediction (WEI-CSIP). Specifically, scatterers around the mobile station (MS) are abstracted from environmental information using multiview images. Then, an environmental feature map is extracted by a convolutional neural network (CNN). Additionally, the deep probabilistic subsampling (DPS) network acquires an optimal fixed pilot pattern. Finally, a CNN-based channel prediction network is designed to predict the complete CSI, using the environmental feature map and partial CSI. Simulation results show that the WEI-CSIP can reduce pilot overhead from 1/5 to 1/8, while improving prediction accuracy with normalized mean squared error reduced to 0.0113, an improvement of 83.2% compared to traditional channel prediction methods.
Abstract:The AI-enabled autoencoder has demonstrated great potential in channel state information (CSI) feedback in frequency division duplex (FDD) multiple input multiple output (MIMO) systems. However, this method completely changes the existing feedback strategies, making it impractical to deploy in recent years. To address this issue, this paper proposes a channel modeling aided data augmentation method based on a limited number of field channel data. Specifically, the user equipment (UE) extracts the primary stochastic parameters of the field channel data and transmits them to the base station (BS). The BS then updates the typical TR 38.901 model parameters with the extracted parameters. In this way, the updated channel model is used to generate the dataset. This strategy comprehensively considers the dataset collection, model generalization, model monitoring, and so on. Simulations verify that our proposed strategy can significantly improve performance compared to the benchmarks.
Abstract:As Extremely Large-Scale Multiple-Input-Multiple-Output (XL-MIMO) technology advances and frequency band rises, the near-field effects in communication are intensifying. A concise and accurate near-field XL-MIMO channel model serves as the cornerstone for investigating the near-field effects. However, existing angular domain XL-MIMO channel models under near-field conditions require non-closed-form wave-number domain integrals for computation, which is complicated. To obtain a more succinct channel model, this paper introduces a closed-form approximate expression based on the principle of stationary phase. It was subsequently shown that when the scatterer distance is larger than the array aperture, the closed-form model can be further simplified as a trapezoidal spectrum. We validate the accuracy of the proposed approximation through simulations of power angular spectrum similarity. The results indicate that the proposed approximation can accurately approximate the near-field angular domain channel within the effective Rayleigh distance.
Abstract:World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the physical world, while video models lack interactive action control over the world simulations. This paper makes a step towards building a general world model by introducing Pandora, a hybrid autoregressive-diffusion model that simulates world states by generating videos and allows real-time control with free-text actions. Pandora achieves domain generality, video consistency, and controllability through large-scale pretraining and instruction tuning. Crucially, Pandora bypasses the cost of training-from-scratch by integrating a pretrained LLM (7B) and a pretrained video model, requiring only additional lightweight finetuning. We illustrate extensive outputs by Pandora across diverse domains (indoor/outdoor, natural/urban, human/robot, 2D/3D, etc.). The results indicate great potential of building stronger general world models with larger-scale training.
Abstract:The task of reasoning over Knowledge Graphs (KGs) poses a significant challenge for Large Language Models (LLMs) due to the complex structure and large amounts of irrelevant information. Existing LLM reasoning methods overlook the importance of compositional learning on KG to supply with precise knowledge. Besides, the fine-tuning and frequent interaction with LLMs incur substantial time and resource costs. This paper focuses on the Question Answering over Knowledge Graph (KGQA) task and proposes an Explore-then-Determine (EtD) framework that synergizes LLMs with graph neural networks (GNNs) for reasoning over KGs. The Explore stage employs a lightweight GNN to explore promising candidates and relevant fine-grained knowledge to the questions, while the Determine stage utilizes the explored information to construct a knowledge-enhanced multiple-choice prompt, guiding a frozen LLM to determine the final answer. Extensive experiments on three benchmark KGQA datasets demonstrate that EtD achieves state-of-the-art performance and generates faithful reasoning results.