Abstract:Multimodal large language models (MLLMs) have shown satisfactory effects in many autonomous driving tasks. In this paper, MLLMs are utilized to solve joint semantic scene understanding and risk localization tasks, while only relying on front-view images. In the proposed MLLM-SUL framework, a dual-branch visual encoder is first designed to extract features from two resolutions, and rich visual information is conducive to the language model describing risk objects of different sizes accurately. Then for the language generation, LLaMA model is fine-tuned to predict scene descriptions, containing the type of driving scenario, actions of risk objects, and driving intentions and suggestions of ego-vehicle. Ultimately, a transformer-based network incorporating a regression token is trained to locate the risk objects. Extensive experiments on the existing DRAMA-ROLISP dataset and the extended DRAMA-SRIS dataset demonstrate that our method is efficient, surpassing many state-of-the-art image-based and video-based methods. Specifically, our method achieves 80.1% BLEU-1 score and 298.5% CIDEr score in the scene understanding task, and 59.6% accuracy in the localization task. Codes and datasets are available at https://github.com/fjq-tongji/MLLM-SUL.
Abstract:In this paper, we propose a novel covariance information-assisted channel state information (CSI) feedback scheme for frequency-division duplex (FDD) massive multi-input multi-output (MIMO) systems. Unlike most existing CSI feedback schemes, which rely on instantaneous CSI only, the proposed CovNet leverages CSI covariance information to achieve high-performance CSI reconstruction, primarily consisting of convolutional neural network (CNN) and Transformer architecture. To efficiently utilize covariance information, we propose a covariance information processing procedure and sophisticatedly design the covariance information processing network (CIPN) to further process it. Moreover, the feed-forward network (FFN) in CovNet is designed to jointly leverage the 2D characteristics of the CSI matrix in the angle and delay domains. Simulation results demonstrate that the proposed network effectively leverages covariance information and outperforms the state-of-the-art (SOTA) scheme across the full compression ratio (CR) range.
Abstract:In moderate- to high-mobility scenarios, channel state information (CSI) varies rapidly and becomes temporally non-stationary, leading to significant performance degradation in channel reciprocity-dependent massive multiple-input multiple-output (MIMO) transmission. To address this challenge, we propose a tensor-structured approach to dynamic channel prediction (TS-DCP) for massive MIMO systems with temporal non-stationarity, leveraging dual-timescale and cross-domain correlations. Specifically, due to the inherent spatial consistency, non-stationary channels on long-timescales are treated as stationary on short-timescales, decoupling complicated correlations into more tractable dual-timescale ones. To exploit such property, we frame the pilot symbols, capturing short-timescale correlations within frames by Doppler domain modeling and long-timescale correlations across frames by Markov/autoregressive processes. Based on this, we develop the tensor-structured signal model in the spatial-frequency-temporal domain, incorporating correlated angle-delay-Doppler domain channels and Vandermonde-structured factor matrices. Furthermore, we model cross-domain correlations within each frame, arising from clustered scatterer distributions, using tensor-structured upgradations of Markov processes and coupled Gaussian distributions. Following these probabilistic models, we formulate the TS-DCP as the variational free energy (VFE) minimization problem, designing trial belief structures through online approximation and the Bethe method. This yields the online TS-DCP algorithm derived from a dual-layer VFE optimization process, where both outer and inner layers leverage the multilinear structure of channels to reduce computational complexity significantly. Numerical simulations demonstrate the significant superiority of the proposed algorithm over benchmarks in terms of channel prediction performance.
Abstract:The unsourced random access (URA) has emerged as a viable scheme for supporting the massive machine-type communications (mMTC) in the sixth generation (6G) wireless networks. Notably, the tensor-based URA (TURA), with its inherent tensor structure, stands out by simultaneously enhancing performance and reducing computational complexity for the multi-user separation, especially in mMTC networks with a large numer of active devices. However, current TURA scheme lacks the soft decoder, thus precluding the incorporation of existing advanced coding techniques. In order to fully explore the potential of the TURA, this paper investigates the Polarcoded TURA (PTURA) scheme and develops the corresponding iterative Bayesian receiver with feedback (IBR-FB). Specifically, in the IBR-FB, we propose the Grassmannian modulation-aided Bayesian tensor decomposition (GM-BTD) algorithm under the variational Bayesian learning (VBL) framework, which leverages the property of the Grassmannian modulation to facilitate the convergence of the VBL process, and has the ability to generate the required soft information without the knowledge of the number of active devices. Furthermore, based on the soft information produced by the GM-BTD, we design the soft Grassmannian demodulator in the IBR-FB. Extensive simulation results demonstrate that the proposed PTURA in conjunction with the IBR-FB surpasses the existing state-of-the-art unsourced random access scheme in terms of accuracy and computational complexity.
Abstract:In this paper, we propose a unified framework based on equivariance for the design of artificial intelligence (AI)-assisted technologies in multi-user multiple-input-multiple-output (MU-MIMO) systems. We first provide definitions of multidimensional equivariance, high-order equivariance, and multidimensional invariance (referred to collectively as tensor equivariance). On this basis, by investigating the design of precoding and user scheduling, which are key techniques in MU-MIMO systems, we delve deeper into revealing tensor equivariance of the mappings from channel information to optimal precoding tensors, precoding auxiliary tensors, and scheduling indicators, respectively. To model mappings with tensor equivariance, we propose a series of plug-and-play tensor equivariant neural network (TENN) modules, where the computation involving intricate parameter sharing patterns is transformed into concise tensor operations. Building upon TENN modules, we propose the unified tensor equivariance framework that can be applicable to various communication tasks, based on which we easily accomplish the design of corresponding AI-assisted precoding and user scheduling schemes. Simulation results demonstrate that the constructed precoding and user scheduling methods achieve near-optimal performance while exhibiting significantly lower computational complexity and generalization to inputs with varying sizes across multiple dimensions. This validates the superiority of TENN modules and the unified framework.
Abstract:Vehicle-to-everything-aided autonomous driving (V2X-AD) has a huge potential to provide a safer driving solution. Despite extensive researches in transportation and communication to support V2X-AD, the actual utilization of these infrastructures and communication resources in enhancing driving performances remains largely unexplored. This highlights the necessity of collaborative autonomous driving: a machine learning approach that optimizes the information sharing strategy to improve the driving performance of each vehicle. This effort necessitates two key foundations: a platform capable of generating data to facilitate the training and testing of V2X-AD, and a comprehensive system that integrates full driving-related functionalities with mechanisms for information sharing. From the platform perspective, we present V2Xverse, a comprehensive simulation platform for collaborative autonomous driving. This platform provides a complete pipeline for collaborative driving. From the system perspective, we introduce CoDriving, a novel end-to-end collaborative driving system that properly integrates V2X communication over the entire autonomous pipeline, promoting driving with shared perceptual information. The core idea is a novel driving-oriented communication strategy. Leveraging this strategy, CoDriving improves driving performance while optimizing communication efficiency. We make comprehensive benchmarks with V2Xverse, analyzing both modular performance and closed-loop driving performance. Experimental results show that CoDriving: i) significantly improves the driving score by 62.49% and drastically reduces the pedestrian collision rate by 53.50% compared to the SOTA end-to-end driving method, and ii) achieves sustaining driving performance superiority over dynamic constraint communication conditions.
Abstract:This paper investigates the robust design of symbol-level precoding (SLP) for multiuser multiple-input multiple-output (MIMO) downlink transmission with imperfect channel state information (CSI) caused by channel aging. By utilizing the a posteriori channel model based on the widely adopted jointly correlated channel model, the imperfect CSI is modeled as the statistical CSI incorporating the channel mean and channel variance information with spatial correlation. With the signal model in the presence of channel aging, we formulate the signal-to-noise-plus-interference ratio (SINR) balancing and minimum mean square error (MMSE) problems for robust SLP design. The former targets to maximize the minimum SINR across users, while the latter minimizes the mean square error between the received signal and the target constellation point. When it comes to massive MIMO scenarios, the increment in the number of antennas poses a computational complexity challenge, limiting the deployment of SLP schemes. To address such a challenge, we simplify the objective function of the SINR balancing problem and further derive a closed-form SLP scheme. Besides, by approximating the matrix involved in the computation, we modify the proposed algorithm and develop an MMSE-based SLP scheme with lower computation complexity. Simulation results confirm the superiority of the proposed schemes over the state-of-the-art SLP schemes.
Abstract:Roadside unit (RSU) can significantly improve the safety and robustness of autonomous vehicles through Vehicle-to-Everything (V2X) communication. Currently, the usage of a single RSU mainly focuses on real-time inference and V2X collaboration, while neglecting the potential value of the high-quality data collected by RSU sensors. Integrating the vast amounts of data from numerous RSUs can provide a rich source of data for model training. However, the absence of ground truth annotations and the difficulty of transmitting enormous volumes of data are two inevitable barriers to fully exploiting this hidden value. In this paper, we introduce FedRSU, an innovative federated learning framework for self-supervised scene flow estimation. In FedRSU, we present a recurrent self-supervision training paradigm, where for each RSU, the scene flow prediction of points at every timestamp can be supervised by its subsequent future multi-modality observation. Another key component of FedRSU is federated learning, where multiple devices collaboratively train an ML model while keeping the training data local and private. With the power of the recurrent self-supervised learning paradigm, FL is able to leverage innumerable underutilized data from RSU. To verify the FedRSU framework, we construct a large-scale multi-modality dataset RSU-SF. The dataset consists of 17 RSU clients, covering various scenarios, modalities, and sensor settings. Based on RSU-SF, we show that FedRSU can greatly improve model performance in ITS and provide a comprehensive benchmark under diverse FL scenarios. To the best of our knowledge, we provide the first real-world LiDAR-camera multi-modal dataset and benchmark for the FL community.
Abstract:This paper investigates the application of a unified non-orthogonal multiple access framework in beam hopping (U-NOMA-BH) based satellite communication systems. More specifically, the proposed U-NOMA-BH framework can be applied to code-domain NOMA based BH (CD-NOMA-BH) and power-domain NOMA based BH (PD-NOMA-BH) systems. To satisfy dynamic-uneven traffic demands, we formulate the optimization problem to minimize the square of discrete difference by jointly optimizing power allocation, carrier assignment and beam scheduling. The non-convexity of the objective function and the constraint condition is solved through Dinkelbach's transform and variable relaxation. As a further development, the closed-from and asymptotic expressions of outage probability are derived for CD/PD-NOMA-BH systems. Based on approximated results, the diversity orders of a pair of users are obtained in detail. In addition, the system throughput of U-NOMA-BH is discussed in delay-limited transmission mode. Numerical results verify that: i) The gap between traffic requests of CD/PD-NOMA-BH systems appears to be more closely compared with orthogonal multiple access based BH (OMA-BH); ii) The CD-NOMA-BH system is capable of providing the enhanced traffic request and capacity provision; and iii) The outage behaviors of CD/PD-NOMA-BH are better than that of OMA-BH.
Abstract:Interaction-aware Autonomous Driving (IAAD) is a rapidly growing field of research that focuses on the development of autonomous vehicles (AVs) that are capable of interacting safely and efficiently with human road users. This is a challenging task, as it requires the autonomous vehicle to be able to understand and predict the behaviour of human road users. In this literature review, the current state of IAAD research is surveyed in this work. Commencing with an examination of terminology, attention is drawn to challenges and existing models employed for modelling the behaviour of drivers and pedestrians. Next, a comprehensive review is conducted on various techniques proposed for interaction modelling, encompassing cognitive methods, machine learning approaches, and game-theoretic methods. The conclusion is reached through a discussion of potential advantages and risks associated with IAAD, along with the illumination of pivotal research inquiries necessitating future exploration.