Abstract:This work considers an uplink asynchronous massive random access scenario in which a large number of users asynchronously access a base station equipped with multiple receive antennas. The objective is to alleviate the problem of massive collision due to the limited number of orthogonal preambles of an access scheme in which user activity detection is performed. We propose a user activity detection with delay-calibration (UAD-DC) algorithm and investigate the benefits of oversampling for the estimation of continuous time delays at the receiver. The proposed algorithm iteratively estimates time delays and detects active users by noting that the collided users can be identified through accurate estimation of time delays. Due to the sporadic user activity patterns, the user activity detection problem can be formulated as a compressive sensing (CS) problem, which can be solved by a modified Turbo-CS algorithm under the consideration of correlated noise samples resulting from oversampling. A sliding-window technique is applied in the proposed algorithm to reduce the overall computational complexity. Moreover, we propose a new design of the pulse shaping filter by minimizing the Bayesian Cram\'er-Rao bound of the detection problem under the constraint of limited spectral bandwidth. Numerical results demonstrate the efficacy of the proposed algorithm in terms of the normalized mean squared error of the estimated channel, the probability of misdetection and the successful detection ratio.
Abstract:This paper introduces Bifr\"ost, a novel 3D-aware framework that is built upon diffusion models to perform instruction-based image composition. Previous methods concentrate on image compositing at the 2D level, which fall short in handling complex spatial relationships ($\textit{e.g.}$, occlusion). Bifr\"ost addresses these issues by training MLLM as a 2.5D location predictor and integrating depth maps as an extra condition during the generation process to bridge the gap between 2D and 3D, which enhances spatial comprehension and supports sophisticated spatial interactions. Our method begins by fine-tuning MLLM with a custom counterfactual dataset to predict 2.5D object locations in complex backgrounds from language instructions. Then, the image-compositing model is uniquely designed to process multiple types of input features, enabling it to perform high-fidelity image compositions that consider occlusion, depth blur, and image harmonization. Extensive qualitative and quantitative evaluations demonstrate that Bifr\"ost significantly outperforms existing methods, providing a robust solution for generating realistically composed images in scenarios demanding intricate spatial understanding. This work not only pushes the boundaries of generative image compositing but also reduces reliance on expensive annotated datasets by effectively utilizing existing resources in innovative ways.
Abstract:Existing diffusion-based methods for inverse problems sample from the posterior using score functions and accept the generated random samples as solutions. In applications that posterior mean is preferred, we have to generate multiple samples from the posterior which is time-consuming. In this work, by analyzing the probability density evolution of the conditional reverse diffusion process, we prove that the posterior mean can be achieved by tracking the mean of each reverse diffusion step. Based on that, we establish a framework termed reverse mean propagation (RMP) that targets the posterior mean directly. We show that RMP can be implemented by solving a variational inference problem, which can be further decomposed as minimizing a reverse KL divergence at each reverse step. We further develop an algorithm that optimizes the reverse KL divergence with natural gradient descent using score functions and propagates the mean at each reverse step. Experiments demonstrate the validity of the theory of our framework and show that our algorithm outperforms state-of-the-art algorithms on reconstruction performance with lower computational complexity in various inverse problems.
Abstract:This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formulate the E2E design of feature encoding, MIMO precoding, and classification as a conditional mutual information maximization problem. However, it is notoriously difficult to design and train an E2E network that can be adaptive to both the task dataset and different channel realizations. Regarding network training, we propose a decoupled pretraining framework that separately trains the feature encoder and the MIMO precoder, with a maximum a posteriori (MAP) classifier employed at the server to generate the inference result. The feature encoder is pretrained exclusively using the task dataset, while the MIMO precoder is pretrained solely based on the channel and noise distributions. Nevertheless, we manage to align the pretraining objectives of each individual component with the E2E learning objective, so as to approach the performance bound of E2E learning. By leveraging the decoupled pretraining results for initialization, the E2E learning can be conducted with minimal training overhead. Regarding network architecture design, we develop two deep unfolded precoding networks that effectively incorporate the domain knowledge of the solution to the decoupled precoding problem. Simulation results on both the CIFAR-10 and ModelNet10 datasets verify that the proposed method achieves significantly higher classification accuracy compared to various baselines.
Abstract:In this paper, we propose a learning-based block-wise planar channel estimator (LBPCE) with high accuracy and low complexity to estimate the time-varying frequency-selective channel of a multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) system. First, we establish a block-wise planar channel model (BPCM) to characterize the correlation of the channel across subcarriers and OFDM symbols. Specifically, adjacent subcarriers and OFDM symbols are divided into several sub-blocks, and an affine function (i.e., a plane) with only three variables (namely, mean, time-domain slope, and frequency-domain slope) is used to approximate the channel in each sub-block, which significantly reduces the number of variables to be determined in channel estimation. Second, we design a 3D dilated residual convolutional network (3D-DRCN) that leverages the time-frequency-space-domain correlations of the channel to further improve the channel estimates of each user. Numerical results demonstrate that the proposed significantly outperforms the state-of-the-art estimators and maintains a relatively low computational complexity.
Abstract:To exploit unprecedented data generation in mobile edge networks, federated learning (FL) has emerged as a promising alternative to the conventional centralized machine learning (ML). However, there are some critical challenges for FL deployment. One major challenge called straggler issue severely limits FL's coverage where the device with the weakest channel condition becomes the bottleneck of the model aggregation performance. Besides, the huge uplink communication overhead compromises the effectiveness of FL, which is particularly pronounced in large-scale systems. To address the straggler issue, we propose the integration of an unmanned aerial vehicle (UAV) as the parameter server (UAV-PS) to coordinate the FL implementation. We further employ over-the-air computation technique that leverages the superposition property of wireless channels for efficient uplink communication. Specifically, in this paper, we develop a novel UAV-enabled over-the-air asynchronous FL (UAV-AFL) framework which supports the UAV-PS in updating the model continuously to enhance the learning performance. Moreover, we conduct a convergence analysis to quantitatively capture the impact of model asynchrony, device selection and communication errors on the UAV-AFL learning performance. Based on this, a unified communication-learning problem is formulated to maximize asymptotical learning performance by optimizing the UAV-PS trajectory, device selection and over-the-air transceiver design. Simulation results demonstrate that the proposed scheme achieves substantially learning efficiency improvement compared with the state-of-the-art approaches.
Abstract:In this paper, we propose a new message passing algorithm that utilizes hybrid vector message passing (HVMP) to solve the generalized bilinear factorization (GBF) problem. The proposed GBF-HVMP algorithm integrates expectation propagation (EP) and variational message passing (VMP) via variational free energy minimization, yielding tractable Gaussian messages. Furthermore, GBF-HVMP enables vector/matrix variables rather than scalar ones in message passing, resulting in a loop-free Bayesian network that improves convergence. Numerical results show that GBF-HVMP significantly outperforms state-of-the-art methods in terms of NMSE performance and computational complexity.
Abstract:Existing near-field localization algorithms generally face a scalability issue when the number of antennas at the sensor array goes large. To address this issue, this paper studies a passive localization system, where an extremely large-scale antenna array (ELAA) is deployed at the base station (BS) to locate a user that transmits signals. The user is considered to be in the near-field (Fresnel) region of the BS array. We propose a novel algorithm, named array partitioning based location estimation (APLE), for scalable near-field localization. The APLE algorithm is developed based on the basic assumption that, by partitioning the ELAA into multiple subarrays, the user can be approximated as in the far-field region of each subarray. The APLE algorithm determines the user's location by exploiting the differences in the angles of arrival (AoAs) of the subarrays. Specifically, we establish a probability model of the received signal based on the geometric constraints of the user's location and the observed AoAs. Then, a message-passing algorithm, i.e., the proposed APLE algorithm, is designed for user localization. APLE exhibits linear computational complexity with the number of BS antennas, leading to a significant reduction in complexity compared to the existing methods. Besides, numerical results demonstrate that the proposed APLE algorithm outperforms the existing baselines in terms of localization accuracy.
Abstract:Satellite Internet of Things (IoT) is to use satellites as the access points for IoT devices to achieve the global coverage of future IoT systems, and is expected to support burgeoning IoT applications, including communication, sensing, and computing. However, the complex and dynamic satellite environments and limited network resources raise new challenges in the design of satellite IoT systems. In this article, we focus on the joint design of communication, sensing, and computing to improve the performance of satellite IoT, which is quite different from the case of terrestrial IoT systems. We describe how the integration of the three functions can enhance system capabilities, and summarize the state-of-the-art solutions. Furthermore, we discuss the main challenges of integrating communication, sensing, and computing in satellite IoT to be solved with pressing interest.
Abstract:Decentralized federated learning (DFL), inherited from distributed optimization, is an emerging paradigm to leverage the explosively growing data from wireless devices in a fully distributed manner.DFL enables joint training of machine learning model under device to device (D2D) communication fashion without the coordination of a parameter server. However, the deployment of wireless DFL is facing some pivotal challenges. Communication is a critical bottleneck due to the required extensive message exchange between neighbor devices to share the learned model. Besides, consensus becomes increasingly difficult as the number of devices grows because there is no available central server to perform coordination. To overcome these difficulties, this paper proposes employing over-the-air computation (Aircomp) to improve communication efficiency by exploiting the superposition property of analog waveform in multi-access channels, and introduce the mixing matrix mechanism to promote consensus using the spectral property of symmetric doubly stochastic matrix. Specifically, we develop a novel multiple-input multiple-output over-the-air DFL (MIMO OA-DFL) framework to study over-the-air DFL problem over MIMO multiple access channels. We conduct a general convergence analysis to quantitatively capture the influence of aggregation weight and communication error on the MIMO OA-DFL performance in \emph{ad hoc} networks. The result shows that the communication error together with the spectral gap of mixing matrix has a significant impact on the learning performance. Based on this, a joint communication-learning optimization problem is formulated to optimize transceiver beamformers and mixing matrix. Extensive numerical experiments are performed to reveal the characteristics of different topologies and demonstrate the substantial learning performance enhancement of our proposed algorithm.