Abstract:Extremely large-scale antenna arrays (ELAA) play a critical role in enabling the functionalities of next generation wireless communication systems. However, as the number of antennas increases, ELAA systems face significant bottlenecks, such as excessive interconnection costs and high computational complexity. Efficient distributed signal processing (SP) algorithms show great promise in overcoming these challenges. In this paper, we provide a comprehensive overview of distributed SP algorithms for ELAA systems, tailored to address these bottlenecks. We start by presenting three representative forms of ELAA systems: single-base station ELAA systems, coordinated distributed antenna systems, and ELAA systems integrated with emerging technologies. For each form, we review the associated distributed SP algorithms in the literature. Additionally, we outline several important future research directions that are essential for improving the performance and practicality of ELAA systems.
Abstract:In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as improving human preference. The central goal of the alignment problem is to adjust the distribution learned by diffusion models such that the generated samples maximize the target reward function. We propose a novel alignment approach, named Direct Noise Optimization (DNO), that optimizes the injected noise during the sampling process of diffusion models. By design, DNO is tuning-free and prompt-agnostic, as the alignment occurs in an online fashion during generation. We rigorously study the theoretical properties of DNO and also propose variants to deal with non-differentiable reward functions. Furthermore, we identify that naive implementation of DNO occasionally suffers from the out-of-distribution reward hacking problem, where optimized samples have high rewards but are no longer in the support of the pretrained distribution. To remedy this issue, we leverage classical high-dimensional statistics theory and propose to augment the DNO loss with certain probability regularization. We conduct extensive experiments on several popular reward functions trained on human feedback data and demonstrate that the proposed DNO approach achieves state-of-the-art reward scores as well as high image quality, all within a reasonable time budget for generation.
Abstract:In Federated Learning (FL), a framework to train machine learning models across distributed data, well-known algorithms like FedAvg tend to have slow convergence rates, resulting in high communication costs during training. To address this challenge, we introduce FedLion, an adaptive federated optimization algorithm that seamlessly incorporates key elements from the recently proposed centralized adaptive algorithm, Lion (Chen et al. 2o23), into the FL framework. Through comprehensive evaluations on two widely adopted FL benchmarks, we demonstrate that FedLion outperforms previous state-of-the-art adaptive algorithms, including FAFED (Wu et al. 2023) and FedDA. Moreover, thanks to the use of signed gradients in local training, FedLion substantially reduces data transmission requirements during uplink communication when compared to existing adaptive algorithms, further reducing communication costs. Last but not least, this work also includes a novel theoretical analysis, showcasing that FedLion attains faster convergence rate than established FL algorithms like FedAvg.
Abstract:Diffusion models have emerged as state-of-the-art generative models for image generation. However, sampling from diffusion models is usually time-consuming due to the inherent autoregressive nature of their sampling process. In this work, we propose a novel approach that accelerates the sampling of diffusion models by parallelizing the autoregressive process. Specifically, we reformulate the sampling process as solving a system of triangular nonlinear equations through fixed-point iteration. With this innovative formulation, we explore several systematic techniques to further reduce the iteration steps required by the solving process. Applying these techniques, we introduce ParaTAA, a universal and training-free parallel sampling algorithm that can leverage extra computational and memory resources to increase the sampling speed. Our experiments demonstrate that ParaTAA can decrease the inference steps required by common sequential sampling algorithms such as DDIM and DDPM by a factor of 4~14 times. Notably, when applying ParaTAA with 100 steps DDIM for Stable Diffusion, a widely-used text-to-image diffusion model, it can produce the same images as the sequential sampling in only 7 inference steps.
Abstract:Mathematical optimization is now widely regarded as an indispensable modeling and solution tool for the design of wireless communications systems. While optimization has played a significant role in the revolutionary progress in wireless communication and networking technologies from 1G to 5G and onto the future 6G, the innovations in wireless technologies have also substantially transformed the nature of the underlying mathematical optimization problems upon which the system designs are based and have sparked significant innovations in the development of methodologies to understand, to analyze, and to solve those problems. In this paper, we provide a comprehensive survey of recent advances in mathematical optimization theory and algorithms for wireless communication system design. We begin by illustrating common features of mathematical optimization problems arising in wireless communication system design. We discuss various scenarios and use cases and their associated mathematical structures from an optimization perspective. We then provide an overview of recent advances in mathematical optimization theory and algorithms, from nonconvex optimization, global optimization, and integer programming, to distributed optimization and learning-based optimization. The key to successful solution of mathematical optimization problems is in carefully choosing and/or developing suitable optimization algorithms (or neural network architectures) that can exploit the underlying problem structure. We conclude the paper by identifying several open research challenges and outlining future research directions.
Abstract:This paper considers the quality-of-service (QoS)-based joint beamforming and compression design problem in the downlink cooperative cellular network, where multiple relay-like base stations (BSs), connected to the central processor via rate-limited fronthaul links, cooperatively transmit messages to the users. The problem of interest is formulated as the minimization of the total transmit power of the BSs, subject to all users' signal-to-interference-plus-noise ratio (SINR) constraints and all BSs' fronthaul rate constraints. In this paper, we first show that there is no duality gap between the considered joint optimization problem and its Lagrangian dual by showing the tightness of its semidefinite relaxation (SDR). Then, we propose an efficient algorithm based on the above duality result for solving the considered problem. The proposed algorithm judiciously exploits the special structure of an enhanced Karush-Kuhn-Tucker (KKT) conditions of the considered problem and finds the solution that satisfies the enhanced KKT conditions via two fixed point iterations. Two key features of the proposed algorithm are: (1) it is able to detect whether the considered problem is feasible or not and find its globally optimal solution when it is feasible; (2) it is highly efficient because both of the fixed point iterations in the proposed algorithm are linearly convergent and evaluating the functions in the fixed point iterations are computationally cheap. Numerical results show the global optimality and efficiency of the proposed algorithm.
Abstract:Recently, the decentralized baseband processing (DBP) paradigm and relevant detection methods have been proposed to enable extremely large-scale massive multiple-input multiple-output technology. Under the DBP architecture, base station antennas are divided into several independent clusters, each connected to a local computing fabric. However, current detection methods tailored to DBP only consider ideal white Gaussian noise scenarios, while in practice, the noise is often colored due to interference from neighboring cells. Moreover, in the DBP architecture, linear minimum mean-square error (LMMSE) detection methods rely on the estimation of the noise covariance matrix through averaging distributedly stored noise samples. This presents a significant challenge for decentralized LMMSE-based equalizer design. To address this issue, this paper proposes decentralized LMMSE equalization methods under colored noise scenarios for both star and daisy chain DBP architectures. Specifically, we first propose two decentralized equalizers for the star DBP architecture based on dimensionality reduction techniques. Then, we derive an optimal decentralized equalizer using the block coordinate descent (BCD) method for the daisy chain DBP architecture with a bandwidth reduction enhancement scheme based on decentralized low-rank decomposition. Finally, simulation results demonstrate that our proposed methods can achieve excellent detection performance while requiring much less communication bandwidth.
Abstract:In this paper, we focus on a novel optimization problem in which the objective function is a black-box and can only be evaluated through a ranking oracle. This problem is common in real-world applications, particularly in cases where the function is assessed by human judges. Reinforcement Learning with Human Feedback (RLHF) is a prominent example of such an application, which is adopted by the recent works \cite{ouyang2022training,liu2023languages,chatgpt,bai2022training} to improve the quality of Large Language Models (LLMs) with human guidance. We propose ZO-RankSGD, a first-of-its-kind zeroth-order optimization algorithm, to solve this optimization problem with a theoretical guarantee. Specifically, our algorithm employs a new rank-based random estimator for the descent direction and is proven to converge to a stationary point. ZO-RankSGD can also be directly applied to the policy search problem in reinforcement learning when only a ranking oracle of the episode reward is available. This makes ZO-RankSGD a promising alternative to existing RLHF methods, as it optimizes in an online fashion and thus can work without any pre-collected data. Furthermore, we demonstrate the effectiveness of ZO-RankSGD in a novel application: improving the quality of images generated by a diffusion generative model with human ranking feedback. Throughout experiments, we found that ZO-RankSGD can significantly enhance the detail of generated images with only a few rounds of human feedback. Overall, our work advances the field of zeroth-order optimization by addressing the problem of optimizing functions with only ranking feedback, and offers an effective approach for aligning human and machine intentions in a wide range of domains. Our code is released here \url{https://github.com/TZW1998/Taming-Stable-Diffusion-with-Human-Ranking-Feedback}.
Abstract:Localized channel modeling is crucial for offline performance optimization of 5G cellular networks, but the existing channel models are for general scenarios and do not capture local geographical structures. In this paper, we propose a novel physics-based and data-driven localized statistical channel modeling (LSCM), which is capable of sensing the physical geographical structures of the targeted cellular environment. The proposed channel modeling solely relies on the reference signal receiving power (RSRP) of the user equipment, unlike the traditional methods which use full channel impulse response matrices. The key is to build the relationship between the RSRP and the channel's angular power spectrum. Based on it, we formulate the task of channel modeling as a sparse recovery problem where the non-zero entries of the sparse vector indicate the channel paths' powers and angles of departure. A computationally efficient weighted non-negative orthogonal matching pursuit (WNOMP) algorithm is devised for solving the formulated problem. Finally, experiments based on synthetic and real RSRP measurements are presented to examine the performance of the proposed method.
Abstract:Federated Learning (FL) is a promising privacy-preserving distributed learning paradigm but suffers from high communication cost when training large-scale machine learning models. Sign-based methods, such as SignSGD \cite{bernstein2018signsgd}, have been proposed as a biased gradient compression technique for reducing the communication cost. However, sign-based algorithms could diverge under heterogeneous data, which thus motivated the development of advanced techniques, such as the error-feedback method and stochastic sign-based compression, to fix this issue. Nevertheless, these methods still suffer from slower convergence rates. Besides, none of them allows multiple local SGD updates like FedAvg \cite{mcmahan2017communication}. In this paper, we propose a novel noisy perturbation scheme with a general symmetric noise distribution for sign-based compression, which not only allows one to flexibly control the tradeoff between gradient bias and convergence performance, but also provides a unified viewpoint to existing stochastic sign-based methods. More importantly, the unified noisy perturbation scheme enables the development of the very first sign-based FedAvg algorithm ($z$-SignFedAvg) to accelerate the convergence. Theoretically, we show that $z$-SignFedAvg achieves a faster convergence rate than existing sign-based methods and, under the uniformly distributed noise, can enjoy the same convergence rate as its uncompressed counterpart. Extensive experiments are conducted to demonstrate that the $z$-SignFedAvg can achieve competitive empirical performance on real datasets and outperforms existing schemes.