Abstract:Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, leading to their increasing deployment in wireless networks for a wide variety of user services. However, the growing longer prompt setting highlights the crucial issue of computational resource demands and huge communication load. To address this challenge, we propose Joint Power and Prompt Optimization (JPPO), a framework that combines Small Language Model (SLM)-based prompt compression with wireless power allocation optimization. By deploying SLM at user devices for prompt compression and employing Deep Reinforcement Learning for joint optimization of compression ratio and transmission power, JPPO effectively balances service quality with resource efficiency. Experimental results demonstrate that our framework achieves high service fidelity and low bit error rates while optimizing power usage in wireless LLM services. The system reduces response time by about 17%, with the improvement varying based on the length of the original prompt.
Abstract:Integrated sensing and communication (ISAC) unifies wireless communication and sensing by sharing spectrum and hardware, which often incurs trade-offs between two functions due to limited resources. However, this paper shifts focus to exploring the synergy between communication and sensing, using WiFi sensing as an exemplary scenario where communication signals are repurposed to probe the environment without dedicated sensing waveforms, followed by data uploading to the edge server for inference. While increased device participation enhances multi-view sensing data, it also imposes significant communication overhead between devices and the edge server. To address this challenge, we aim to maximize the sensing task performance, measured by mutual information, under the channel capacity constraint. The information-theoretic optimization problem is solved by the proposed ADE-MI, a novel framework that employs a two-stage optimization two-stage optimization approach: (1) adaptive distributed encoding (ADE) at the device, which ensures transmitted bits are most relevant to sensing tasks, and (2) multi-view Inference (MI) at the edge server, which orchestrates multi-view data from distributed devices. Our experimental results highlight the synergy between communication and sensing, showing that more frequent communication from WiFi access points to edge devices improves sensing inference accuracy. The proposed ADE-MI achieves 92\% recognition accuracy with over $10^4$-fold reduction in latency compared to schemes with raw data communication, achieving both high sensing inference accuracy and low communication latency simultaneously.
Abstract:The development of sixth-generation (6G) wireless communication systems demands innovative solutions to address challenges in the deployment of a large number of base stations and the detection of multi-band signals. Quantum technology, specifically nitrogen vacancy (NV) centers in diamonds, offers promising potential for the development of compact, robust receivers capable of supporting multiple users. For the first time, we propose a multiple access scheme using fluorescent nanodiamonds (FNDs) containing NV centers as nano-antennas. The unique response of each FND to applied microwaves allows for distinguishable patterns of fluorescence intensities, enabling multi-user signal demodulation. We demonstrate the effectiveness of our FNDs-implemented receiver by simultaneously transmitting two uncoded digitally modulated information bit streams from two separate transmitters, achieving a low bit error ratio. Moreover, our design supports tunable frequency band communication and reference-free signal decoupling, reducing communication overhead. Furthermore, we implement a miniaturized device comprising all essential components, highlighting its practicality as a receiver serving multiple users simultaneously. This approach paves the way for the integration of quantum sensing technologies in future 6G wireless communication networks.
Abstract:Leveraging the strong atom-light interaction, Rydberg atomic receivers significantly enhance the sensitivity of electromagnetic signal measurements, outperforming traditional antennas. Existing research primarily focuses on improving the architecture and signal detection algorithms of atomic receivers, while established signal processing schemes at the transmitter end have remained constant. However, these schemes fail to maximize the throughput of atomic receivers due to the nonlinearity of transmission model. To address this issue, we propose to design transmitter precoding in multiple-input multiple-output systems to achieve the capacity of atomic receivers. Initially, we harness a strong reference approximation to convert the nonlinear magnitude-detection model of atomic receivers into a linear real-part detector. Based on this approximation, we prove that the degree of freedom is min{Nr/2,Nt} for a MIMO system comprising an Nr-antenna atomic receiver and an Nt-antenna classic transmitter. To achieve the system capacity, we propose an IQ-aware fully digital precoding method. Unlike traditional complex-valued digital precoders that jointly manipulate the inphase and quadrature (IQ) symbols, our method employs four real matrices to independently precode the IQ baseband symbols, which is shown to be optimal for atomic receivers. Then, to eliminate the reliance on fully digital precoding architecture, we further explore IQ-aware hybrid precoding techniques. Our design incorporates a low-dimensional IQ-aware digital precoder and a high-dimensional complex analog precoder. Alternating minimization algorithms are proposed to produce IQ-aware hybrid precoders, with the objective of approaching the optimal IQ-aware fully digital precoder. Simulation results validate the superiority of proposed IQ-aware precoding methods over existing techniques in atomic MIMO communications.
Abstract:The forthcoming generation of wireless technology, 6G, promises a revolutionary leap beyond traditional data-centric services. It aims to usher in an era of ubiquitous intelligent services, where everything is interconnected and intelligent. This vision requires the seamless integration of three fundamental modules: Sensing for information acquisition, communication for information sharing, and computation for information processing and decision-making. These modules are intricately linked, especially in complex tasks such as edge learning and inference. However, the performance of these modules is interdependent, creating a resource competition for time, energy, and bandwidth. Existing techniques like integrated communication and computation (ICC), integrated sensing and computation (ISC), and integrated sensing and communication (ISAC) have made partial strides in addressing this challenge, but they fall short of meeting the extreme performance requirements. To overcome these limitations, it is essential to develop new techniques that comprehensively integrate sensing, communication, and computation. This integrated approach, known as Integrated Sensing, Communication, and Computation (ISCC), offers a systematic perspective for enhancing task performance. This paper begins with a comprehensive survey of historic and related techniques such as ICC, ISC, and ISAC, highlighting their strengths and limitations. It then explores the state-of-the-art signal designs for ISCC, along with network resource management strategies specifically tailored for ISCC. Furthermore, this paper discusses the exciting research opportunities that lie ahead for implementing ISCC in future advanced networks. By embracing ISCC, we can unlock the full potential of intelligent connectivity, paving the way for groundbreaking applications and services.
Abstract:The emergence of large-scale foundation models (FoMo's) that can perform human-like intelligence motivates their deployment at the network edge for devices to access state-of-the-art artificial intelligence. For better user experiences, the pre-trained FoMo's need to be adapted to specialized downstream tasks through fine-tuning techniques. To transcend a single device's memory and computation limitations, we advocate multi-device cooperation within the device-edge cooperative fine-tuning (DEFT) paradigm, where edge devices cooperate to simultaneously optimize different parts of fine-tuning parameters within a FoMo. However, the parameter blocks reside at different depths within a FoMo architecture, leading to varied computation latency-and-memory cost due to gradient backpropagation-based calculations. The heterogeneous on-device computation and memory capacities and channel conditions necessitate an integrated communication-and-computation allocation of local computation loads and communication resources to achieve low-latency (LoLa) DEFT. To this end, we consider the depth-ware DEFT block allocation problem. The involved optimal block-device matching is tackled by the proposed low-complexity Cutting-RecoUNting-CHecking (CRUNCH) algorithm, which is designed by exploiting the monotone-increasing property between block depth and computation latency-and-memory cost. Next, the joint bandwidth-and-block allocation makes the problem more sophisticated. We observe a splittable Lagrangian expression through the transformation and analysis of the original problem, where the variables indicating device involvement are introduced. Then, the dual ascent method is employed to tackle this problem iteratively. Through extensive experiments conducted on the GLUE benchmark, our results demonstrate significant latency reduction achievable by LoLa DEFT for fine-tuning a RoBERTa model.
Abstract:A novel paradigm of mobile edge generation (MEG)-enabled digital twin (DT) is proposed, which enables distributed on-device generation at mobile edge networks for real-time DT applications. First, an MEG-DT architecture is put forward to decentralize generative artificial intelligence (GAI) models onto edge servers (ESs) and user equipments (UEs), which has the advantages of low latency, privacy preservation, and individual-level customization. Then, various single-user and multi-user generation mechanisms are conceived for MEG-DT, which strike trade-offs between generation latency, hardware costs, and device coordination. Furthermore, to perform efficient distributed generation, two operating protocols are explored for transmitting interpretable and latent features between ESs and UEs, namely sketch-based generation and seed-based generation, respectively. Based on the proposed protocols, the convergence between MEG and DT are highlighted. Considering the seed-based image generation scenario, numerical case studies are provided to reveal the superiority of MEG-DT over centralized generation. Finally, promising applications and research opportunities are identified.
Abstract:In this paper, we study efficient multi-beam training design for near-field communications to reduce the beam training overhead of conventional single-beam training methods. In particular, the array-division based multi-beam training method, which is widely used in far-field communications, cannot be directly applied to the near-field scenario, since different sub-arrays may observe different user angles and there exist coverage holes in the angular domain. To address these issues, we first devise a new near-field multi-beam codebook by sparsely activating a portion of antennas to form a sparse linear array (SLA), hence generating multiple beams simultaneously by effective exploiting the near-field grating-lobs. Next, a two-stage near-field beam training method is proposed, for which several candidate user locations are identified firstly based on multi-beam sweeping over time, followed by the second stage to further determine the true user location with a small number of single-beam sweeping. Finally, numerical results show that our proposed multi-beam training method significantly reduces the beam training overhead of conventional single-beam training methods, yet achieving comparable rate performance in data transmission.
Abstract:Over-the-air computation (AirComp) leverages the signal-superposition characteristic of wireless multiple access channels to perform mathematical computations. Initially introduced to enhance communication reliability in interference channels and wireless sensor networks, AirComp has more recently found applications in task-oriented communications, namely, for wireless distributed learning and in wireless control systems. Its adoption aims to address latency challenges arising from an increased number of edge devices or IoT devices accessing the constrained wireless spectrum. This paper focuses on the physical layer of these systems, specifically on the waveform and the signal processing aspects at the transmitter and receiver to meet the challenges that AirComp presents within the different contexts and use cases.
Abstract:Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with $\left(1-\epsilon\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models.