Abstract:For next-generation green communication systems, this article proposes an innovative communication system based on frequency-diverse array-multiple-input multiple-output (FDA-MIMO) technology, which aims to achieve high data rates while maintaining low power consumption. This system utilizes frequency offset index realign modulation, multiple-antenna spatial index modulation, and spreading code index modulation techniques. In the proposed generalized code index modulation-aided frequency offset realign multiple-antenna spatial modulation (GCIM-FORMASM) system, the coming bits are divided into five parts: spatial modulation bits by activating multiple transmit antennas, frequency offset index bits of the FDA antennas, including frequency offset combination bits and frequency offset realign bits, spreading code index modulation bits, and modulated symbol bits. Subsequently, this paper utilizes the orthogonal waveforms transmitted by the FDA to design the corresponding transmitter and receiver structures and provide specific expressions for the received signals. Meanwhile, to reduce the decoding complexity of the maximum likelihood (ML) algorithm, we propose a three-stage despreading-based low complexity (DBLC) algorithm leveraging the orthogonality of the spreading codes. Additionally, a closed-form expression for the upper bound of the average bit error probability (ABEP) of the DBLC algorithm has been derived. Analyzing metrics such as energy efficiency and data rate shows that the proposed system features low power consumption and high data transmission rates, which aligns better with the concept of future green communications. The effectiveness of our proposed methods has been validated through comprehensive numerical results.
Abstract:Received signal strength (RSS)--based cooperative localization has gained significant attention due to its straightforward system architectures and cost-effectiveness. In this paper, we propose Cooperative Localization Techniques (with Unknown Parameters), referred to as CTUP(s), which consider uncertainty in anchor nodes' locations and assume the transmit power and \textcolor{blue}{path loss exponent (PLE)} to be unknown. Unlike prior studies, CTUP(s) address unknowns by estimating these parameters, along with the location of target nodes. The non-convex and non-linear nature of the maximum likelihood (ML) estimator of the problem is addressed through relaxation techniques, employing Taylor series expansion, semidefinite relaxation (SDR), and the epigraph method. The resulting problem is solved using semidefinite second-order cone programming (SDP-SOCP), leveraging the precision of SDP and the simplicity of SOCP. We deployed an extensive network comprising 50 BLE nodes covering an area of 640~m $\times$ 180~m to gather RSS data. The precise location of the nodes is obtained using real-time kinematics global positioning system (RTK-GPS), which is treated as the ground truth. Furthermore, to replicate real-world scenarios, we recorded the positions of the anchor nodes using a standard GPS, thereby introducing uncertainty into the anchor node locations. Extensive simulation and hardware experimentation demonstrate the superior performance of CTUP compared to existing techniques.
Abstract:We propose a novel-view augmentation (NOVA) strategy to train NeRFs for photo-realistic 3D composition of dynamic objects in a static scene. Compared to prior work, our framework significantly reduces blending artifacts when inserting multiple dynamic objects into a 3D scene at novel views and times; achieves comparable PSNR without the need for additional ground truth modalities like optical flow; and overall provides ease, flexibility, and scalability in neural composition. Our codebase is on GitHub.
Abstract:Methods for 3D lane detection have been recently proposed to address the issue of inaccurate lane layouts in many autonomous driving scenarios (uphill/downhill, bump, etc.). Previous work struggled in complex cases due to their simple designs of the spatial transformation between front view and bird's eye view (BEV) and the lack of a realistic dataset. Towards these issues, we present PersFormer: an end-to-end monocular 3D lane detector with a novel Transformer-based spatial feature transformation module. Our model generates BEV features by attending to related front-view local regions with camera parameters as a reference. PersFormer adopts a unified 2D/3D anchor design and an auxiliary task to detect 2D/3D lanes simultaneously, enhancing the feature consistency and sharing the benefits of multi-task learning. Moreover, we release one of the first large-scale real-world 3D lane datasets, which is called OpenLane, with high-quality annotation and scenario diversity. OpenLane contains 200,000 frames, over 880,000 instance-level lanes, 14 lane categories, along with scene tags and the closed-in-path object annotations to encourage the development of lane detection and more industrial-related autonomous driving methods. We show that PersFormer significantly outperforms competitive baselines in the 3D lane detection task on our new OpenLane dataset as well as Apollo 3D Lane Synthetic dataset, and is also on par with state-of-the-art algorithms in the 2D task on OpenLane. The project page is available at https://github.com/OpenPerceptionX/PersFormer_3DLane and OpenLane dataset is provided at https://github.com/OpenPerceptionX/OpenLane.
Abstract:Sequential recommendation has been a widely popular topic of recommender systems. Existing works have contributed to enhancing the prediction ability of sequential recommendation systems based on various methods, such as recurrent networks and self-attention mechanisms. However, they fail to discover and distinguish various relationships between items, which could be underlying factors which motivate user behaviors. In this paper, we propose an Edge-Enhanced Global Disentangled Graph Neural Network (EGD-GNN) model to capture the relation information between items for global item representation and local user intention learning. At the global level, we build a global-link graph over all sequences to model item relationships. Then a channel-aware disentangled learning layer is designed to decompose edge information into different channels, which can be aggregated to represent the target item from its neighbors. At the local level, we apply a variational auto-encoder framework to learn user intention over the current sequence. We evaluate our proposed method on three real-world datasets. Experimental results show that our model can get a crucial improvement over state-of-the-art baselines and is able to distinguish item features.
Abstract:Depth completion aims at inferring a dense depth image from sparse depth measurement since glossy, transparent or distant surface cannot be scanned properly by the sensor. Most of existing methods directly interpolate the missing depth measurements based on pixel-wise image content and the corresponding neighboring depth values. Consequently, this leads to blurred boundaries or inaccurate structure of object. To address these problems, we propose a novel self-guided instance-aware network (SG-IANet) that: (1) utilize self-guided mechanism to extract instance-level features that is needed for depth restoration, (2) exploit the geometric and context information into network learning to conform to the underlying constraints for edge clarity and structure consistency, (3) regularize the depth estimation and mitigate the impact of noise by instance-aware learning, and (4) train with synthetic data only by domain randomization to bridge the reality gap. Extensive experiments on synthetic and real world dataset demonstrate that our proposed method outperforms previous works. Further ablation studies give more insights into the proposed method and demonstrate the generalization capability of our model.