National Mobile Communications Research Laboratory, Southeast University, Nanjing, China
Abstract:Since the beam squint and near-field effects both inherently exist in upper-6 GHz (U6G) extremely large-scale multiple-input multiple-output (XL-MIMO) systems, wideband near-field channel estimation faces severe challenges, such as higher computational complexity, and higher pilot overhead particularly at hybrid architectures with fewer radio frequency (RF) chains. To precisely reduce the complexity and number of pilots, the parametric symmetry of wideband near-field channels is explored, such that the channel parameters, including angle, distance, and range, can be decoupled based on the delay variations observed by different antennas. Based on this, a distributed parametric symmetry-based (DPS) algorithm, applicable to U6G XL-MIMO, is proposed. The delays observed by different subarrays are estimated and extrapolated across the local processing units (LPUs) firstly, and then, the channel parameters are decoupled and estimated at the central processing unit (CPU), by only linearly combining the delays from different LPUs. The path gains are calculated at different LPUs, respectively, to reconstruct the channel with low complexity. Since the proposed algorithm does not rely on scanning the polar-domain dictionary, only a single pilot is required even with hybrid architectures. Furthermore, the computational complexity, multiple-path resolution, Cramer-Rao lower bound (CRLB) and lower bound (LB) of the estimates in hybrid architectures and the DPS algorithm, respectively, are analyzed, to evaluate the realizable potential of the proposed algorithm. The simulation results prove that the proposed algorithm has a higher estimation accuracy, while requiring less complexity and pilots.
Abstract:Digital twins (DTs) are promising for wireless deployment, optimization, and data generation, but building a propagation-faithful twin from sparse real measurements remains difficult. This paper proposes a wireless environment digital twin (WEDT) construction paradigm that evolves a reconstructed geometric DT into a propagation-consistent wireless environment representation through calibration of a scene-level electromagnetic (EM) property field. Instead of directly fitting link-specific channel responses, the proposed paradigm first constructs a geometry-prior Bayesian channel map (BCM) to convert sparse position-labeled channel state information (CSI) into dense probabilistic supervision with uncertainty estimates. It then embeds the learnable EM property field into differentiable ray tracing (RT) based channel computation, thereby enabling calibration through an explicit propagation chain. Experiments in both public and real-world scenes show that WEDT achieves accurate channel prediction, generalizes to unseen transceiver topologies, and remains effective across different sampling conditions. WEDT also offers utility for material-related environment sensing, more reliable physical-layer planning, and higher-quality synthetic data generation for wireless AI. These results demonstrate the value of the proposed paradigm for propagation-consistent WEDT construction and related wireless applications.
Abstract:The inherent randomness of communication symbols creates a fundamental tension in Integrated Sensing and Communications (ISAC). On the one hand, they enable data transmission while allowing sensing to fully reuse communication resources. On the other hand, their randomness induces waveform-dependent fluctuations that directly affect sensing accuracy. This paper investigates a foundational question arising from this tradeoff: \textit{How does the modulation waveform affect the ranging Cramér--Rao Bound (CRB) when sensing reuses random data symbols?} We address this question by revealing a structural factorization of the Fisher information matrix (FIM) for joint delay-amplitude estimation, which separates the deterministic Jacobian of the target geometry from the random frequency-domain signal power induced by the data symbols. This structure yields a Jensen-type universal lower bound on the CRB, which is exactly attained by CP-OFDM under PSK constellations. For QAM and broader sub-Gaussian constellations, we develop an asymptotic perturbation analysis of the inverse FIM and prove that, when the number of transmitted symbols $N$ grows large, CP-OFDM achieves a lower ranging CRB than any frequency-spread orthogonal waveform over the almost-sure event where the random FIM is invertible. This superiority is further extended to amplitude estimation and full joint delay-amplitude estimation. We also characterize the local geometry of the stochastic CRB minimization problem over the unitary group. The analysis reveals that CP-OFDM is a stationary point for finite $N$, and its Riemannian Hessian is positive semidefinite for sufficiently large $N$, establishing its asymptotic local optimality. Numerical results confirm that OFDM outperforms representative waveforms including SC, OTFS, and AFDM.
Abstract:Smart glasses are emerging as a promising interface between humans and artificial intelligence (AI) agents, enabling first-person perception, contextual awareness, and real-time assistance. However, continuous offloading of visual data from wearable devices to cloud-based vision-language models (VLMs) is fundamentally constrained by limited wireless bandwidth and energy resources. This paper proposes an intention-aware semantic agent communication framework for AI glasses, where data transmission is guided by user intention rather than raw pixel fidelity. In the proposed architecture, AI glasses act as an edge semantic agent while a server-side VLM executes high-level cognition and reasoning. The user intention can be inferred by the server-side VLM through the current transmitted content and the historical prompts. Driven by specific user intentions, the glasses adaptively preserve textual content, document layout, or object semantics before transmission. We evaluate three representative scenarios with different lightweight preprocessing tools on the AI glasses. Simulation results demonstrate that intention-aware preprocessing significantly achieves more than 50% bandwidth reduction depending on the current task while maintaining task performance. Moreover, semantic transmission exhibits graceful degradation under low SNRs. The findings demonstrate that aligning communication resources with user intention is essential for robust and efficient wearable AI agent systems.
Abstract:The increasing deployment of agentic artificial intelligence (AI) systems has intensified the demand for efficient agent to agent communication, particularly over bandwidth limited wireless links. In embodied AI applications, agents must exchange task related information under strict latency and reliability constraints. Existing agent communication methods primarily focus on connectivity and protocol efficiency, but lack effective mechanisms to reduce physical layer transmission overhead while preserving task semantics.To address this challenge, this paper proposes a semantic agent communication framework that reduces communication overhead while maintaining task performance and shared understanding among agents. An LLM based semantic processor is first introduced to reorganize and condense agent generated messages by extracting task relevant semantic content. To cope with information loss introduced by aggressive message reduction, an importance-aware semantic transmission strategy is developed, which adaptively protects semantic components according to their task importance. Furthermore, a task specific knowledge base is incorporated as long term semantic memory to support recurring tasks and further reduce bandwidth consumption with minimal performance degradation. Experimental results and ablation studies demonstrate that the proposed framework achieves nearly 50% bandwidth reduction with negligible loss in task completion performance compared to conventional transmission schemes.
Abstract:The aggressive densification of modern wireless networks necessitates judicious resource allocation to mitigate severe mutual interference. However, classical iterative algorithms remain computationally prohibitive for real-time applications requiring rapid responsiveness. While recent deep learning-based methods show promise, they typically function as task-specific solvers lacking the flexibility to adapt to different objectives and scenarios without expensive retraining. To address these limitations, we propose a graph foundation model for resource allocation (GFM-RA) based on a pre-training and fine-tuning paradigm to extract unified representations, thereby enabling rapid adaptation to different objectives and scenarios. Specifically, we introduce an interference-aware Transformer architecture with a bias projector that injects interference topologies into global attention mechanisms. Furthermore, we develop a hybrid self-supervised pre-training strategy that synergizes masked edge prediction with negative-free Teacher-Student contrastive learning, enabling the model to capture transferable structural representations from massive unlabeled datasets. Extensive experiments demonstrate that the proposed framework achieves state-of-the-art performance and scales effectively with increased model capacity. Crucially, leveraging its unified representations, the foundation model exhibits exceptional sample efficiency, enabling robust few-shot adaptation to diverse and unsupervised downstream objectives in out-of-distribution (OOD) scenarios. These results demonstrate the promise of pre-trained foundation models for adaptable wireless resource allocation and provide a strong foundation for future research on generalizable learning-based wireless optimization.
Abstract:Environment-aware 6G wireless networks demand the deep integration of multimodal and wireless data. However, most existing datasets are confined to 2D terrestrial far-field scenarios, lacking the 3D spatial context and near-field characteristics crucial for low-altitude extremely large-scale multiple-input multiple-output (XL-MIMO) systems. To bridge this gap, this letter introduces Multimodal-NF, a large-scale dataset and specialized generation framework. Operating in the upper midband, it synchronizes high-fidelity near-field channel state information (CSI) and precise wireless labels (e.g., Top-5 beam indices, LoS/NLoS) with comprehensive sensory modalities (RGB images, LiDAR point clouds, and GPS). Crucially, these multimodal priors provide spatial semantics that help reduce the near-field search space and thereby lower the overhead of wireless sensing and communication tasks. Finally, we validate the dataset through representative case studies, demonstrating its utility and effectiveness. The open-source generator and dataset are available at https://lmyxxn.github.io/6GXLMIMODatasets/.
Abstract:This paper investigates narrowband coordinated user scheduling in multi-cell massive multiple-input multiple-output (MIMO) systems. We formulate the problem under a spectral-efficiency maximization criterion, revealing inherent challenges in computational complexity and signaling overhead. To address these, we develop a user-scheduling-oriented CKM (US-CKM) and a US-CKM-driven two-stage coordinated scheduling framework. By exploiting the mapping between location information and statistical channel state information (SCSI), the system enables rapid SCSI retrieval and persistent reuse, substantially reducing CSI acquisition overhead. Embedding statistical channel correlation into the CKM further characterizes interuser interference patterns. The framework designs an intra-cell active-user selection scheme for the first stage and an inter-cell coordinated scheduling scheme for the second, both based on US-CKM entries. The first stage identifies users with favorable channel gains and low intra-cell interference, reducing the candidate set with marginal sum-rate loss. The second stage suppresses inter-cell interference (ICI) by exploiting cross-cell channel correlations. To enhance robustness against imperfect SCSI in dynamic scattering environments, we augment the framework with a reliability-guided mechanism. Instead of uniform treatment, we evaluate entry stability using a grid reliability metric quantifying channel measurement variance at sampling locations. Low-reliability grids are identified, and their instantaneous CSI is acquired in real time to integrate with existing SCSI. This process refines channel gain and spatial correlation characteristics, ensuring robust performance under imperfect conditions.
Abstract:In near-field extremely large-scale multiple-input multiple-output (XL-MIMO) systems, spherical wavefront propagation expands the traditional beam codebook into the joint angular-distance domain, rendering conventional beam training prohibitively inefficient, especially in complex 3-dimensional (3D) low-altitude environments. Furthermore, since near-field beam variations are deeply coupled not only with user positions but also with the physical surroundings, precise beam alignment demands profound environmental understanding capabilities. To address this, we propose a large language model (LLM)-driven multimodal framework that fuses historical GPS data, RGB image, LiDAR data, and strategically designed task-specific textual prompts. By utilizing the powerful emergent reasoning and generalization capabilities of the LLM, our approach learns complex spatial dynamics to achieve superior environmental comprehension...
Abstract:Accurate beam prediction is a key enabler for next-generation wireless communication systems. In this paper, we propose a multimodal large language model (LLM)-based beam prediction framework that effectively utilizes contextual information, provided by sensory data including RGB camera images and LiDAR point clouds. To effectively fuse heterogeneous modalities, we design specialized modality encoders together with a beam-guided attention masking mechanism and a high-frequency temporal alignment strategy, enabling robust cross-modal feature integration under dynamic environments. Furthermore, we construct a large-scale multimodal dataset for communication, named Multimodal-Wireless, which covers diverse weather and traffic conditions with high-fidelity ray-tracing labels. Extensive simulation results demonstrate that the proposed approach significantly reduces the reliance on oracle angle-of-departure knowledge and consistently outperforms state-of-the-art multimodal LLM-based beam prediction methods in terms of beam accuracy and communication performance, improving the average Top-1 accuracy to 80.8% and the average normalized gain to 89.1%.