University of Waterloo
Abstract:Dual function radar and communication (DFRC) is a promising research direction within integrated sensing and communication (ISAC), improving hardware and spectrum efficiency by merging sensing and communication (S&C) functionalities into a shared platform. However, the DFRC receiver (DFRC-R) is tasked with both uplink communication signal detection and simultaneously target-related parameter estimation from the echoes, leading to issues with mutual interference. In this paper, a projection-based scheme is proposed to equivalently transform the joint signal detection and target estimation problem into a joint signal detection process across multiple snapshots. Compared with conventional successive interference cancellation (SIC) schemes, our proposed approach achieves a higher signal-to-noise ratio (SNR), and a higher ergodic rate when the radar signal is non-negligible. Nonetheless, it introduces an ill-conditioned signal detection problem, which is addressed using a non-linear detector. By jointly processing an increased number of snapshots, the proposed scheme can achieve high S&C performance simultaneously.
Abstract:The photorealistic reconstruction and rendering of architectural scenes have extensive applications in industries such as film, games, and transportation. It also plays an important role in urban planning, architectural design, and the city's promotion, especially in protecting historical and cultural relics. The 3D Gaussian Splatting, due to better performance over NeRF, has become a mainstream technology in 3D reconstruction. Its only input is a set of images but it relies heavily on geometric parameters computed by the SfM process. At the same time, there is an existing abundance of raw 3D models, that could inform the structural perception of certain buildings but cannot be applied. In this paper, we propose a straightforward method to harness these raw 3D models to guide 3D Gaussians in capturing the basic shape of the building and improve the visual quality of textures and details when photos are captured non-systematically. This exploration opens up new possibilities for improving the effectiveness of 3D reconstruction techniques in the field of architectural design.
Abstract:In this paper, we consider the time-varying channel estimation in millimeter wave (mmWave) multiple-input multiple-output MIMO systems with hybrid beamforming architectures. Different from the existing contributions that considered single-carrier mmWave systems with high mobility, the wideband orthogonal frequency division multiplexing (OFDM) system is considered in this work. To solve the channel estimation problem under channel double selectivity, we propose a pilot transmission scheme based on 5G OFDM, and the received signals are formed as a fourth-order tensor, which fits the low-rank CANDECOMP/PARAFAC (CP) model. By further exploring the Vandermonde structure of factor matrix, a tensor-subspace decomposition based channel estimation method is proposed to solve the CP decomposition, where the uniqueness condition is analyzed. Based on the decomposed factor matrices, the channel parameters, including angles of arrival/departure, delays, channel gains and Doppler shifts are estimated, and the Cram\'{e}r-Rao bound (CRB) results are derived as performance metrics. Simulation results demonstrate the superior performance of the proposed method over other benchmarks. Furthermore, the channel estimation methods are tested based on the channel parameters generated by Wireless InSites, and simulation results show the effectiveness of the proposed method in practical scenarios.
Abstract:In this paper, we investigate a cascaded channel estimation method for a millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) system aided by a reconfigurable intelligent surface (RIS) with the BS equipped with low-resolution analog-to-digital converters (ADCs), where the BS and the RIS are both equipped with a uniform planar array (UPA). Due to the sparse property of mmWave channel, the channel estimation can be solved as a compressed sensing (CS) problem. However, the low-resolution quantization cause severe information loss of signals, and traditional CS algorithms are unable to work well. To recovery the signal and the sparse angular domain channel from quantization, we introduce Bayesian inference and efficient vector approximate message passing (VAMP) algorithm to solve the quantize output CS problem. To further improve the efficiency of the VAMP algorithm, a Fast Fourier Transform (FFT) based fast computation method is derived. Simulation results demonstrate the effectiveness and the accuracy of the proposed cascaded channel estimation method for the RIS-aided mmWave massive MIMO system with few-bit ADCs. Furthermore, the proposed channel estimation method can reach an acceptable performance gap between the low-resolution ADCs and the infinite ADCs for the low signal-to-noise ratio (SNR), which implies the applicability of few-bit ADCs in practice.
Abstract:In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM training can employ low-precision data formats without compromising model accuracy and requiring no changes to hyper-parameters. Specifically, we propose a new FP8 automatic mixed-precision framework for training LLMs. This framework offers three levels of FP8 utilization to streamline mixed-precision and distributed parallel training for LLMs. It gradually incorporates 8-bit gradients, optimizer states, and distributed learning in an incremental manner. Experiment results show that, during the training of GPT-175B model on H100 GPU platform, our FP8 mixed-precision training framework not only achieved a remarkable 42% reduction in real memory usage but also ran 64% faster than the widely adopted BF16 framework (i.e., Megatron-LM), surpassing the speed of Nvidia Transformer Engine by 17%. This largely reduces the training costs for large foundation models. Furthermore, our FP8 mixed-precision training methodology is generic. It can be seamlessly applied to other tasks such as LLM instruction tuning and reinforcement learning with human feedback, offering savings in fine-tuning expenses. Our FP8 low-precision training framework is open-sourced at {https://github.com/Azure/MS-AMP}{aka.ms/MS.AMP}.
Abstract:Millimeter wave (mmWave) massive multiple-input multiple-output (massive MIMO) is one of the most promising technologies for the fifth generation and beyond wireless communication system. However, a large number of antennas incur high power consumption and hardware costs, and high-frequency communications place a heavy burden on the analog-to-digital converters (ADCs) at the base station (BS). Furthermore, it is too costly to equipping each antenna with a high-precision ADC in a large antenna array system. It is promising to adopt low-resolution ADCs to address this problem. In this paper, we investigate the cascaded channel estimation for a mmWave massive MIMO system aided by a reconfigurable intelligent surface (RIS) with the BS equipped with few-bit ADCs. Due to the low-rank property of the cascaded channel, the estimation of the cascaded channel can be formulated as a low-rank matrix completion problem. We introduce a Bayesian optimal estimation framework for estimating the user-RIS-BS cascaded channel to tackle with the information loss caused by quantization. To implement the estimator and achieve the matrix completion, we use efficient bilinear generalized approximate message passing (BiG-AMP) algorithm. Extensive simulation results verify that our proposed method can accurately estimate the cascaded channel for the RIS-aided mmWave massive MIMO system with low-resolution ADCs.
Abstract:Despite recent monumental advances in the field, many Natural Language Processing (NLP) models still struggle to perform adequately on noisy domains. We propose a novel probabilistic embedding-level method to improve the robustness of NLP models. Our method, Robust Embeddings via Distributions (RED), incorporates information from both noisy tokens and surrounding context to obtain distributions over embedding vectors that can express uncertainty in semantic space more fully than any deterministic method. We evaluate our method on a number of downstream tasks using existing state-of-the-art models in the presence of both natural and synthetic noise, and demonstrate a clear improvement over other embedding approaches to robustness from the literature.
Abstract:Kalman Filter (KF) is widely used in various domains to perform sequential learning or variable estimation. In the context of autonomous vehicles, KF constitutes the core component of many Advanced Driver Assistance Systems (ADAS), such as Forward Collision Warning (FCW). It tracks the states (distance, velocity etc.) of relevant traffic objects based on sensor measurements. The tracking output of KF is often fed into downstream logic to produce alerts, which will then be used by human drivers to make driving decisions in near-collision scenarios. In this paper, we study adversarial attacks on KF as part of the more complex machine-human hybrid system of Forward Collision Warning. Our attack goal is to negatively affect human braking decisions by causing KF to output incorrect state estimations that lead to false or delayed alerts. We accomplish this by sequentially manipulating measure ments fed into the KF, and propose a novel Model Predictive Control (MPC) approach to compute the optimal manipulation. Via experiments conducted in a simulated driving environment, we show that the attacker is able to successfully change FCW alert signals through planned manipulation over measurements prior to the desired target time. These results demonstrate that our attack can stealthily mislead a distracted human driver and cause vehicle collisions.
Abstract:Facial appearance plays an important role in our social lives. Subjective perception of women's beauty depends on various face-related (e.g., skin, shape, hair) and environmental (e.g., makeup, lighting, angle) factors. Similar to cosmetic surgery in the physical world, virtual face beautification is an emerging field with many open issues to be addressed. Inspired by the latest advances in style-based synthesis and face beauty prediction, we propose a novel framework of face beautification. For a given reference face with a high beauty score, our GAN-based architecture is capable of translating an inquiry face into a sequence of beautified face images with referenced beauty style and targeted beauty score values. To achieve this objective, we propose to integrate both style-based beauty representation (extracted from the reference face) and beauty score prediction (trained on SCUT-FBP database) into the process of beautification. Unlike makeup transfer, our approach targets at many-to-many (instead of one-to-one) translation where multiple outputs can be defined by either different references or varying beauty scores. Extensive experimental results are reported to demonstrate the effectiveness and flexibility of the proposed face beautification framework.
Abstract:We present an approach to generate high fidelity 3D face avatar with a high-resolution UV texture map from a single image. To estimate the face geometry, we use a deep neural network to directly predict vertex coordinates of the 3D face model from the given image. The 3D face geometry is further refined by a non-rigid deformation process to more accurately capture facial landmarks before texture projection. A key novelty of our approach is to train the shape regression network on facial images synthetically generated using a high-quality rendering engine. Moreover, our shape estimator fully leverages the discriminative power of deep facial identity features learned from millions of facial images. We have conducted extensive experiments to demonstrate the superiority of our optimized 2D-to-3D rendering approach, especially its excellent generalization property on real-world selfie images. Our proposed system of rendering 3D avatars from 2D images has a wide range of applications from virtual/augmented reality (VR/AR) and telepsychiatry to human-computer interaction and social networks.