Abstract:The field of 3D object detection from point clouds is rapidly advancing in computer vision, aiming to accurately and efficiently detect and localize objects in three-dimensional space. Current 3D detectors commonly fall short in terms of flexibility and scalability, with ample room for advancements in performance. In this paper, our objective is to address these limitations by introducing two frameworks for 3D object detection with minimal hand-crafted design. Firstly, we propose CT3D, which sequentially performs raw-point-based embedding, a standard Transformer encoder, and a channel-wise decoder for point features within each proposal. Secondly, we present an enhanced network called CT3D++, which incorporates geometric and semantic fusion-based embedding to extract more valuable and comprehensive proposal-aware information. Additionally, CT3D ++ utilizes a point-to-key bidirectional encoder for more efficient feature encoding with reduced computational cost. By replacing the corresponding components of CT3D with these novel modules, CT3D++ achieves state-of-the-art performance on both the KITTI dataset and the large-scale Way\-mo Open Dataset. The source code for our frameworks will be made accessible at https://github.com/hlsheng1/CT3D-plusplus.
Abstract:Hybrid beamforming is vital in modern wireless systems, especially for massive MIMO and millimeter-wave deployments, offering efficient directional transmission with reduced hardware complexity. However, effective beamforming in multi-user scenarios relies heavily on accurate channel state information, the acquisition of which often incurs excessive pilot overhead, degrading system performance. To address this and inspired by the spatial congruence between sub-6GHz (sub-6G) and mmWave channels, we propose a Sub-6G information Aided Multi-User Hybrid Beamforming (SA-MUHBF) framework, avoiding excessive use of pilots. SA-MUHBF employs a convolutional neural network to predict mmWave beamspace from sub-6G channel estimate, followed by a novel multi-layer graph neural network for analog beam selection and a linear minimum mean-square error algorithm for digital beamforming. Numerical results demonstrate that SA-MUHBF efficiently predicts the mmWave beamspace representation and achieves superior spectrum efficiency over state-of-the-art benchmarks. Moreover, SA-MUHBF demonstrates robust performance across varied sub-6G system configurations and exhibits strong generalization to unseen scenarios.
Abstract:Due to the ability to provide superior error-correction performance, the successive cancellation list (SCL) algorithm is widely regarded as one of the most promising decoding algorithms for polar codes with short-to-moderate code lengths. However, the application of SCL decoding in low-latency communication scenarios is limited due to its sequential nature. To reduce the decoding latency, developing tailored fast and efficient list decoding algorithms of specific polar substituent codes (special nodes) is a promising solution. Recently, fast list decoding algorithms are proposed by considering special nodes with low code rates. Aiming to further speedup the SCL decoding, this paper presents fast list decoding algorithms for two types of high-rate special nodes, namely single-parity-check (SPC) nodes and sequence rate one or single-parity-check (SR1/SPC) nodes. In particular, we develop two classes of fast list decoding algorithms for these nodes, where the first class uses a sequential decoding procedure to yield decoding latency that is linear with the list size, and the second further parallelizes the decoding process by pre-determining the redundant candidate paths offline. Simulation results show that the proposed list decoding algorithms are able to achieve up to 70.7\% lower decoding latency than state-of-the-art fast SCL decoders, while exhibiting the same error-correction performance.
Abstract:In this paper, we investigate a self-sensing intelligent reflecting surface (IRS) aided millimeter wave (mmWave) integrated sensing and communication (ISAC) system. Unlike the conventional purely passive IRS, the self-sensing IRS can effectively reduce the path loss of sensing-related links, thus rendering it advantageous in ISAC systems. Aiming to jointly sense the target/scatterer/user positions as well as estimate the sensing and communication (SAC) channels in the considered system, we propose a two-phase transmission scheme, where the coarse and refined sensing/channel estimation (CE) results are respectively obtained in the first phase (using scanning-based IRS reflection coefficients) and second phase (using optimized IRS reflection coefficients). For each phase, an angle-based sensing turbo variational Bayesian inference (AS-TVBI) algorithm, which combines the VBI, messaging passing and expectation-maximization (EM) methods, is developed to solve the considered joint location sensing and CE problem. The proposed algorithm effectively exploits the partial overlapping structured (POS) sparsity and 2-dimensional (2D) block sparsity inherent in the SAC channels to enhance the overall performance. Based on the estimation results from the first phase, we formulate a Cram\'{e}r-Rao bound (CRB) minimization problem for optimizing IRS reflection coefficients, and through proper reformulations, a low-complexity manifold-based optimization algorithm is proposed to solve this problem. Simulation results are provided to verify the superiority of the proposed transmission scheme and associated algorithms.
Abstract:Since Intersection-over-Union (IoU) based optimization maintains the consistency of the final IoU prediction metric and losses, it has been widely used in both regression and classification branches of single-stage 2D object detectors. Recently, several 3D object detection methods adopt IoU-based optimization and directly replace the 2D IoU with 3D IoU. However, such a direct computation in 3D is very costly due to the complex implementation and inefficient backward operations. Moreover, 3D IoU-based optimization is sub-optimal as it is sensitive to rotation and thus can cause training instability and detection performance deterioration. In this paper, we propose a novel Rotation-Decoupled IoU (RDIoU) method that can mitigate the rotation-sensitivity issue, and produce more efficient optimization objectives compared with 3D IoU during the training stage. Specifically, our RDIoU simplifies the complex interactions of regression parameters by decoupling the rotation variable as an independent term, yet preserving the geometry of 3D IoU. By incorporating RDIoU into both the regression and classification branches, the network is encouraged to learn more precise bounding boxes and concurrently overcome the misalignment issue between classification and regression. Extensive experiments on the benchmark KITTI and Waymo Open Dataset validate that our RDIoU method can bring substantial improvement for the single-stage 3D object detection.
Abstract:Due to the sequential nature of the successive-cancellation (SC) algorithm, the decoding of polar codes suffers from significant decoding latencies. Fast SC decoding is able to speed up the SC decoding process, by implementing parallel decoders at the intermediate levels of the SC decoding tree for some special nodes with specific information and frozen bit patterns. To further improve the parallelism of SC decoding, this paper present a new class of special node composed of a sequence of rate one or single-parity-check (SR1/SPC) nodes, which envelops a wide variety of existing special node types. Then, we analyse the parity constraints caused by the frozen bits in each descendant node, such that the decoding performance of the SR1/SPC node can be preserved once the parity constraints are satisfied. Finally, a generalized fast SC decoding algorithm is proposed for the SR1/SPC node, where the corresponding parity constraints are taken into consideration. Simulation results show that compared with the state-of-the-art fast SC decoding algorithms, the proposed decoding algorithm achieves a higher degree of parallelism, especially for high-rate polar codes, without tangibly altering the error-correction performance.
Abstract:For intelligent reflecting surface (IRS)-aided wireless communications, channel estimation is essential and usually requires excessive channel training overhead when the number of IRS reflecting elements is large. The acquisition of accurate channel state information (CSI) becomes more challenging when the channel is not quasi-static due to the mobility of the transmitter and/or receiver. In this work, we study an IRS-aided wireless communication system with a time-varying channel model and propose an innovative two-stage transmission protocol. In the first stage, we send pilot symbols and track the direct/reflected channels based on the received signal, and then data signals are transmitted. In the second stage, instead of sending pilot symbols first, we directly predict the direct/reflected channels and all the time slots are used for data transmission. Based on the proposed transmission protocol, we propose a two-stage channel tracking and prediction (2SCTP) scheme to obtain the direct and reflected channels with low channel training overhead, which is achieved by exploiting the temporal correlation of the time-varying channels. Specifically, we first consider a special case where the IRS-access point (AP) channel is assumed to be static, for which a Kalman filter (KF)-based algorithm and a long short-term memory (LSTM)-based neural network are proposed for channel tracking and prediction, respectively. Then, for the more general case where the IRS-AP, user-IRS and user-AP channels are all assumed to be time-varying, we present a generalized KF (GKF)-based channel tracking algorithm, where proper approximations are employed to handle the underlying non-Gaussian random variables. Numerical simulations are provided to verify the effectiveness of our proposed transmission protocol and channel tracking/prediction algorithms as compared to existing ones.
Abstract:Though 3D object detection from point clouds has achieved rapid progress in recent years, the lack of flexible and high-performance proposal refinement remains a great hurdle for existing state-of-the-art two-stage detectors. Previous works on refining 3D proposals have relied on human-designed components such as keypoints sampling, set abstraction and multi-scale feature fusion to produce powerful 3D object representations. Such methods, however, have limited ability to capture rich contextual dependencies among points. In this paper, we leverage the high-quality region proposal network and a Channel-wise Transformer architecture to constitute our two-stage 3D object detection framework (CT3D) with minimal hand-crafted design. The proposed CT3D simultaneously performs proposal-aware embedding and channel-wise context aggregation for the point features within each proposal. Specifically, CT3D uses proposal's keypoints for spatial contextual modelling and learns attention propagation in the encoding module, mapping the proposal to point embeddings. Next, a new channel-wise decoding module enriches the query-key interaction via channel-wise re-weighting to effectively merge multi-level contexts, which contributes to more accurate object predictions. Extensive experiments demonstrate that our CT3D method has superior performance and excellent scalability. Remarkably, CT3D achieves the AP of 81.77% in the moderate car category on the KITTI test 3D detection benchmark, outperforms state-of-the-art 3D detectors.
Abstract:We consider a two-stage stochastic optimization problem, in which a long-term optimization variable is coupled with a set of short-term optimization variables in both objective and constraint functions. Despite that two-stage stochastic optimization plays a critical role in various engineering and scientific applications, there still lack efficient algorithms, especially when the long-term and short-term variables are coupled in the constraints. To overcome the challenge caused by tightly coupled stochastic constraints, we first establish a two-stage primal-dual decomposition (PDD) method to decompose the two-stage problem into a long-term problem and a family of short-term subproblems. Then we propose a PDD-based stochastic successive convex approximation (PDD-SSCA) algorithmic framework to find KKT solutions for two-stage stochastic optimization problems. At each iteration, PDD-SSCA first runs a short-term sub-algorithm to find stationary points of the short-term subproblems associated with a mini-batch of the state samples. Then it constructs a convex surrogate for the long-term problem based on the deep unrolling of the short-term sub-algorithm and the back propagation method. Finally, the optimal solution of the convex surrogate problem is solved to generate the next iterate. We establish the almost sure convergence of PDD-SSCA and customize the algorithmic framework to solve two important application problems. Simulations show that PDD-SSCA can achieve superior performance over existing solutions.
Abstract:Intelligent reflecting surface (IRS) has emerged as a promising paradigm to improve the capacity and reliability of a wireless communication system by smartly reconfiguring the wireless propagation environment. To achieve the promising gains of IRS, the acquisition of the channel state information (CSI) is essential, which however is practically difficult since the IRS does not employ any transmit/receive radio frequency (RF) chains in general and it has limited signal processing capability. In this paper, we study the uplink channel estimation problem for an IRS-aided multiuser single-input multi-output (SIMO) system, and propose a novel two-phase channel estimation (2PCE) strategy which can alleviate the negative effects caused by error propagation in the existing three-phase channel estimation approach, i.e., the channel estimation errors in previous phases will deteriorate the estimation performance in later phases, and enhance the channel estimation performance with the same amount of channel training overhead as in the existing approach. Moreover, the asymptotic mean squared error (MSE) of the 2PCE strategy is analyzed when the least-square (LS) channel estimation method is employed, and we show that the 2PCE strategy can outperform the existing approach. Finally, extensive simulation results are presented to validate the effectiveness of the 2PCE strategy.