National Mobile Communications Research Laboratory, Southeast University, Nanjing, China
Abstract:6G networks will introduce unprecedented complexity, which calls for a paradigm shift in network optimization and management. Artificial intelligence (AI)-based solutions, especially those enabled by the recently developed foundation models, have been recognized as promising candidates. Foundation models are large-scale AI models with general-purpose feature extraction capabilities, and once trained on massive amounts of data, they can be adapted to solve a wide range of downstream tasks, either in a zero-shot manner or with few-shot fine-tuning. This article provides a comprehensive overview of how foundation models are reshaping physical-layer processing and wireless resource management across three progressive paradigms. First, we examine the adaptation of off-the-shelf pre-trained foundation models to various wireless tasks. Second, we explore wireless-native foundation models, built from scratch on wireless data to bridge cross-domain modality gaps and capture universal wireless-domain physical characteristics. Third, we highlight agentic foundation models, which elevate static data processing into autonomous, reasoning-driven network orchestration. Furthermore, we discuss the impact of applying foundation models to emerging 6G frontiers, including integrated sensing and communications (ISAC), new multiple-input multiple-output (MIMO) architectures, semantic communications, and system-level network autonomy. Finally, we identify critical open challenges and opportunities, charting a promising path toward fully intelligent and adaptive wireless networks.
Abstract:Extreme data scarcity and inherent multipath spatial ambiguity severely limit existing deep learning-based channel state information (CSI) fingerprinting localization schemes for target unmanned aerial vehicles (UAVs). To overcome these challenges, we propose an end-to-end semi-supervised generative localization framework. First, by exploiting the temporal correlations inherent in continuous flight trajectories, a self-supervised encoder extracts robust spatial features from massive unlabeled CSI sequences to establish structured latent representations. Following this, we utilize a consistency model, a powerful derivative of diffusion architectures, as the core generative backbone to map the learned latent space to physical coordinates, jointly fine-tuning the pre-trained encoder with a strictly limited set of labeled CSI. This consistency formulation models the conditional distribution to resolve the mean collapse problem of discriminative models, while compressing the inference trajectory to 1-2 steps to avoid the latency bottleneck of traditional diffusion models. Furthermore, a lightweight distributed fusion mechanism is designed to aggregate spatial predictions across multiple base stations (BS) from a multi-view geometry perspective. Comprehensive evaluations on a real-world measurement dataset demonstrate that our framework achieves low latency and suppresses the mean localization error to 9.77 cm under a 3-BS fusion setup with only a 1\% label fraction, significantly outperforming existing fully supervised and semi-supervised discriminative baselines.
Abstract:Superimposed pilot (SIP) transmission improves spectral efficiency by eliminating the dedicated pilot overhead required in orthogonal pilot (OP)-based schemes. However, SIP suffers from severe pilot-data coupling, which leads to a critical performance-complexity bottleneck at the receiver. To address this issue, this paper proposes a low-overhead transmission framework that revitalizes data-dependent superimposed training (DDST) with enhanced interference mitigation strategies. First, for quasi-static block-fading channels, an enhanced DDST receiver is developed to achieve non-iterative pilot-data decoupling by exploiting data-dependent algebraic structures. Second, to overcome the sensitivity of conventional DDST to channel variations and symbol misidentification in fast time-varying environments, a mix transmission scheme is developed. By strategically applying DDST to a subset of resource elements, the proposed scheme combines the interference-free transmission property of OP with the zero-pilot-overhead advantage of SIP, thereby improving demapping reliability and interference suppression. Furthermore, under the proposed mix scheme, a Vision Transformer-based neural receiver is designed to capture the orthogonal structure between pilots and perturbation-bearing data, as well as the underlying channel correlations, thereby relaxing the stringent quasi-static assumption required for interference disentanglement. Simulation results demonstrate that the proposed framework achieves significant performance gains in the low-to-medium SNR regime under time-varying channels while providing superior computational efficiency compared with state-of-the-art SIP receivers.
Abstract:User localization and beam management are tightly linked in extremely large-scale multiple-input multiple-output (XL-MIMO) systems, especially in dense low-altitude economy (LAE) scenarios. However, the near-field propagation in XL-MIMO introduces strong distance sensitivity and complex spatial coupling, which makes joint trajectory and beam prediction challenging. Meanwhile, large language models (LLMs) have attracted attention in physical-layer transmission for modeling long-range dependencies. In this paper, we propose NF-TrackLLM, a multi-modal semantic-aware framework for near-field unmanned aerial vehicles (UAVs) positioning and beam prediction in XL-MIMO systems. By incorporating visual and LiDAR sensing into a Sionna-based channel generation pipeline, environmental semantics and GPS are utilized to guide trajectory and beam prediction. Built upon the aligned multi-modal representation, a GPT-2-based spatiotemporal reasoning backbone, and a cascaded prediction strategy are employed, where future trajectories are first inferred and then used to guide beam prediction as geometric priors. Simulation results demonstrate that NF-TrackLLM achieves accurate beam prediction and reliable UAV trajectory tracking in dense urban low-altitude scenarios.
Abstract:Digital twins (DTs) are promising for wireless deployment, optimization, and data generation, but building a propagation-faithful twin from sparse real measurements remains difficult. This paper proposes a wireless environment digital twin (WEDT) construction paradigm that evolves a reconstructed geometric DT into a propagation-consistent wireless environment representation through calibration of a scene-level electromagnetic (EM) property field. Instead of directly fitting link-specific channel responses, the proposed paradigm first constructs a geometry-prior Bayesian channel map (BCM) to convert sparse position-labeled channel state information (CSI) into dense probabilistic supervision with uncertainty estimates. It then embeds the learnable EM property field into differentiable ray tracing (RT) based channel computation, thereby enabling calibration through an explicit propagation chain. Experiments in both public and real-world scenes show that WEDT achieves accurate channel prediction, generalizes to unseen transceiver topologies, and remains effective across different sampling conditions. WEDT also offers utility for material-related environment sensing, more reliable physical-layer planning, and higher-quality synthetic data generation for wireless AI. These results demonstrate the value of the proposed paradigm for propagation-consistent WEDT construction and related wireless applications.
Abstract:Since the beam squint and near-field effects both inherently exist in upper-6 GHz (U6G) extremely large-scale multiple-input multiple-output (XL-MIMO) systems, wideband near-field channel estimation faces severe challenges, such as higher computational complexity, and higher pilot overhead particularly at hybrid architectures with fewer radio frequency (RF) chains. To precisely reduce the complexity and number of pilots, the parametric symmetry of wideband near-field channels is explored, such that the channel parameters, including angle, distance, and range, can be decoupled based on the delay variations observed by different antennas. Based on this, a distributed parametric symmetry-based (DPS) algorithm, applicable to U6G XL-MIMO, is proposed. The delays observed by different subarrays are estimated and extrapolated across the local processing units (LPUs) firstly, and then, the channel parameters are decoupled and estimated at the central processing unit (CPU), by only linearly combining the delays from different LPUs. The path gains are calculated at different LPUs, respectively, to reconstruct the channel with low complexity. Since the proposed algorithm does not rely on scanning the polar-domain dictionary, only a single pilot is required even with hybrid architectures. Furthermore, the computational complexity, multiple-path resolution, Cramer-Rao lower bound (CRLB) and lower bound (LB) of the estimates in hybrid architectures and the DPS algorithm, respectively, are analyzed, to evaluate the realizable potential of the proposed algorithm. The simulation results prove that the proposed algorithm has a higher estimation accuracy, while requiring less complexity and pilots.
Abstract:The inherent randomness of communication symbols creates a fundamental tension in Integrated Sensing and Communications (ISAC). On the one hand, they enable data transmission while allowing sensing to fully reuse communication resources. On the other hand, their randomness induces waveform-dependent fluctuations that directly affect sensing accuracy. This paper investigates a foundational question arising from this tradeoff: \textit{How does the modulation waveform affect the ranging Cramér--Rao Bound (CRB) when sensing reuses random data symbols?} We address this question by revealing a structural factorization of the Fisher information matrix (FIM) for joint delay-amplitude estimation, which separates the deterministic Jacobian of the target geometry from the random frequency-domain signal power induced by the data symbols. This structure yields a Jensen-type universal lower bound on the CRB, which is exactly attained by CP-OFDM under PSK constellations. For QAM and broader sub-Gaussian constellations, we develop an asymptotic perturbation analysis of the inverse FIM and prove that, when the number of transmitted symbols $N$ grows large, CP-OFDM achieves a lower ranging CRB than any frequency-spread orthogonal waveform over the almost-sure event where the random FIM is invertible. This superiority is further extended to amplitude estimation and full joint delay-amplitude estimation. We also characterize the local geometry of the stochastic CRB minimization problem over the unitary group. The analysis reveals that CP-OFDM is a stationary point for finite $N$, and its Riemannian Hessian is positive semidefinite for sufficiently large $N$, establishing its asymptotic local optimality. Numerical results confirm that OFDM outperforms representative waveforms including SC, OTFS, and AFDM.
Abstract:Smart glasses are emerging as a promising interface between humans and artificial intelligence (AI) agents, enabling first-person perception, contextual awareness, and real-time assistance. However, continuous offloading of visual data from wearable devices to cloud-based vision-language models (VLMs) is fundamentally constrained by limited wireless bandwidth and energy resources. This paper proposes an intention-aware semantic agent communication framework for AI glasses, where data transmission is guided by user intention rather than raw pixel fidelity. In the proposed architecture, AI glasses act as an edge semantic agent while a server-side VLM executes high-level cognition and reasoning. The user intention can be inferred by the server-side VLM through the current transmitted content and the historical prompts. Driven by specific user intentions, the glasses adaptively preserve textual content, document layout, or object semantics before transmission. We evaluate three representative scenarios with different lightweight preprocessing tools on the AI glasses. Simulation results demonstrate that intention-aware preprocessing significantly achieves more than 50% bandwidth reduction depending on the current task while maintaining task performance. Moreover, semantic transmission exhibits graceful degradation under low SNRs. The findings demonstrate that aligning communication resources with user intention is essential for robust and efficient wearable AI agent systems.
Abstract:The increasing deployment of agentic artificial intelligence (AI) systems has intensified the demand for efficient agent to agent communication, particularly over bandwidth limited wireless links. In embodied AI applications, agents must exchange task related information under strict latency and reliability constraints. Existing agent communication methods primarily focus on connectivity and protocol efficiency, but lack effective mechanisms to reduce physical layer transmission overhead while preserving task semantics.To address this challenge, this paper proposes a semantic agent communication framework that reduces communication overhead while maintaining task performance and shared understanding among agents. An LLM based semantic processor is first introduced to reorganize and condense agent generated messages by extracting task relevant semantic content. To cope with information loss introduced by aggressive message reduction, an importance-aware semantic transmission strategy is developed, which adaptively protects semantic components according to their task importance. Furthermore, a task specific knowledge base is incorporated as long term semantic memory to support recurring tasks and further reduce bandwidth consumption with minimal performance degradation. Experimental results and ablation studies demonstrate that the proposed framework achieves nearly 50% bandwidth reduction with negligible loss in task completion performance compared to conventional transmission schemes.
Abstract:The aggressive densification of modern wireless networks necessitates judicious resource allocation to mitigate severe mutual interference. However, classical iterative algorithms remain computationally prohibitive for real-time applications requiring rapid responsiveness. While recent deep learning-based methods show promise, they typically function as task-specific solvers lacking the flexibility to adapt to different objectives and scenarios without expensive retraining. To address these limitations, we propose a graph foundation model for resource allocation (GFM-RA) based on a pre-training and fine-tuning paradigm to extract unified representations, thereby enabling rapid adaptation to different objectives and scenarios. Specifically, we introduce an interference-aware Transformer architecture with a bias projector that injects interference topologies into global attention mechanisms. Furthermore, we develop a hybrid self-supervised pre-training strategy that synergizes masked edge prediction with negative-free Teacher-Student contrastive learning, enabling the model to capture transferable structural representations from massive unlabeled datasets. Extensive experiments demonstrate that the proposed framework achieves state-of-the-art performance and scales effectively with increased model capacity. Crucially, leveraging its unified representations, the foundation model exhibits exceptional sample efficiency, enabling robust few-shot adaptation to diverse and unsupervised downstream objectives in out-of-distribution (OOD) scenarios. These results demonstrate the promise of pre-trained foundation models for adaptable wireless resource allocation and provide a strong foundation for future research on generalizable learning-based wireless optimization.