Abstract:Understanding the traffic scenes and then generating high-definition (HD) maps present significant challenges in autonomous driving. In this paper, we defined a novel Traffic Topology Scene Graph, a unified scene graph explicitly modeling the lane, controlled and guided by different road signals (e.g., right turn), and topology relationships among them, which is always ignored by previous high-definition (HD) mapping methods. For the generation of T2SG, we propose TopoFormer, a novel one-stage Topology Scene Graph TransFormer with two newly designed layers. Specifically, TopoFormer incorporates a Lane Aggregation Layer (LAL) that leverages the geometric distance among the centerline of lanes to guide the aggregation of global information. Furthermore, we proposed a Counterfactual Intervention Layer (CIL) to model the reasonable road structure ( e.g., intersection, straight) among lanes under counterfactual intervention. Then the generated T2SG can provide a more accurate and explainable description of the topological structure in traffic scenes. Experimental results demonstrate that TopoFormer outperforms existing methods on the T2SG generation task, and the generated T2SG significantly enhances traffic topology reasoning in downstream tasks, achieving a state-of-the-art performance of 46.3 OLS on the OpenLane-V2 benchmark. We will release our source code and model.
Abstract:In current open real-world autonomous driving scenarios, challenges such as sensor failure and extreme weather conditions hinder the generalization of most autonomous driving perception models to these unseen domain due to the domain shifts between the test and training data. As the parameter scale of autonomous driving perception models grows, traditional test-time adaptation (TTA) methods become unstable and often degrade model performance in most scenarios. To address these challenges, this paper proposes two new robust methods to improve the Batch Normalization with TTA for object detection in autonomous driving: (1) We introduce a LearnableBN layer based on Generalized-search Entropy Minimization (GSEM) method. Specifically, we modify the traditional BN layer by incorporating auxiliary learnable parameters, which enables the BN layer to dynamically update the statistics according to the different input data. (2) We propose a new semantic-consistency based dual-stage-adaptation strategy, which encourages the model to iteratively search for the optimal solution and eliminates unstable samples during the adaptation process. Extensive experiments on the NuScenes-C dataset shows that our method achieves a maximum improvement of about 8% using BEVFormer as the baseline model across six corruption types and three levels of severity. We will make our source code available soon.
Abstract:The next generation wireless systems will face stringent new requirements, including ultra-low latency, high data rates and enhanced reliability. Large Intelligent Surfaces, is one proposed solution that has the potential to solve these high demands. The real-life deployment of such systems involves different design considerations with non-trivial trade-offs. This paper investigates the trade-off between spectral efficiency and processing latency, considering different antenna distribution schemes and detection algorithms. A latency model for the physical layer processing has been developed with realistic hardware parameters. The simulation results using an in-door environment show that distributing the antennas throughout the scenario improves the overall reliability, while the impact on the latency is limited when zero-forcing (ZF) detection is used. On the other hand, changing the detection algorithm to maximum-ratio combining (MRC) may reduce the latency significantly, even if a larger number of antennas are needed to achieve a similar spectrum efficiency as ZF detection.
Abstract:Recent advancements in UAV technology have spurred interest in developing multi-UAV aerial surveying systems for use in confined environments where GNSS signals are blocked or jammed. This paper focuses airborne magnetic surveying scenarios. To obtain clean magnetic measurements reflecting the Earth's magnetic field, the magnetic sensor must be isolated from other electronic devices, creating a significant localization challenge. We propose a visual cooperative localization solution. The solution incorporates a visual processing module and an improved manifold-based sensor fusion algorithm, delivering reliable and accurate positioning information. Real flight experiments validate the approach, demonstrating single-axis centimeter-level accuracy and decimeter-level overall 3D positioning accuracy.
Abstract:Large language models (LLMs) have demonstrated remarkable capabilities in generating human-like text and exhibiting personality traits similar to those in humans. However, the mechanisms by which LLMs encode and express traits such as agreeableness and impulsiveness remain poorly understood. Drawing on the theory of social determinism, we investigate how long-term background factors, such as family environment and cultural norms, interact with short-term pressures like external instructions, shaping and influencing LLMs' personality traits. By steering the output of LLMs through the utilization of interpretable features within the model, we explore how these background and pressure factors lead to changes in the model's traits without the need for further fine-tuning. Additionally, we suggest the potential impact of these factors on model safety from the perspective of personality.
Abstract:Large Language Models have demonstrated remarkable abilities across various tasks, with Chain-of-Thought (CoT) prompting emerging as a key technique to enhance reasoning capabilities. However, existing research primarily focuses on improving performance, lacking a comprehensive framework to explain and understand the fundamental factors behind CoT's success. To bridge this gap, we introduce a novel perspective grounded in the Hopfieldian view of cognition in cognitive neuroscience. We establish a connection between CoT reasoning and key cognitive elements such as stimuli, actions, neural populations, and representation spaces. From our view, we can understand the reasoning process as the movement between these representation spaces. Building on this insight, we develop a method for localizing reasoning errors in the response of CoTs. Moreover, we propose the Representation-of-Thought (RoT) framework, which leverages the robustness of low-dimensional representation spaces to enhance the robustness of the reasoning process in CoTs. Experimental results demonstrate that RoT improves the robustness and interpretability of CoT reasoning while offering fine-grained control over the reasoning process.
Abstract:LiDAR panoptic segmentation, which jointly performs instance and semantic segmentation for things and stuff classes, plays a fundamental role in LiDAR perception tasks. While most existing methods explicitly separate these two segmentation tasks and utilize different branches (i.e., semantic and instance branches), some recent methods have embraced the query-based paradigm to unify LiDAR panoptic segmentation. However, the distinct spatial distribution and inherent characteristics of objects(things) and their surroundings(stuff) in 3D scenes lead to challenges, including the mutual competition of things/stuff and the ambiguity of classification/segmentation. In this paper, we propose decoupling things/stuff queries according to their intrinsic properties for individual decoding and disentangling classification/segmentation to mitigate ambiguity. To this end, we propose a novel framework dubbed DQFormer to implement semantic and instance segmentation in a unified workflow. Specifically, we design a decoupled query generator to propose informative queries with semantics by localizing things/stuff positions and fusing multi-level BEV embeddings. Moreover, a query-oriented mask decoder is introduced to decode corresponding segmentation masks by performing masked cross-attention between queries and mask embeddings. Finally, the decoded masks are combined with the semantics of the queries to produce panoptic results. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our DQFormer framework.
Abstract:This paper considers networked sensing in cellular network, where multiple base stations (BSs) first compress their received echo signals from multiple targets and then forward the quantized signals to the cloud via limited-capacity backhaul links, such that the cloud can leverage all useful echo signals to perform high-resolution localization. Under this setup, we manage to characterize the posterior Cramer-Rao Bound (PCRB) for localizing all the targets as a function of the transmit covariance matrix and the compression noise covariance matrix of each BS. Then, a PCRB minimization problem subject to the transmit power constraints and the backhaul capacity constraints is formulated to jointly design the BSs' transmission and compression strategies. We propose an efficient algorithm to solve this problem based on the alternating optimization technique. Specifically, it is shown that when either the transmit covariance matrices or the compression noise covariance matrices are fixed, the successive convex approximation technique can be leveraged to optimize the other type of covariance matrices locally. Numerical results are provided to verify the effectiveness of our proposed algorithm.
Abstract:Due to circuit failures, defective elements that cannot adaptively adjust the phase shifts of their impinging signals in a desired manner may exist on an intelligent reflecting surface (IRS). Traditional way to find these defective IRS elements requires a thorough diagnosis of all the circuits belonging to a huge number of IRS elements, which is practically challenging. In this paper, we will devise a novel approach under which a transmitter sends known pilot signals and a receiver localizes all the defective IRS elements just based on its over-the-air measurements reflected from the IRS. The key lies in the fact that the over-the-air measurements at the receiver side are functions of the set of defective IRS elements. Based on this observation, we propose a bisection based method to localize all the defective IRS elements. Specifically, at each time slot, we properly control the desired phase shifts of all the IRS elements such that half of the considered regime that is not useful to localize the defective elements can be found based on the received signals and removed. Via numerical results, it is shown that our proposed bisection method can exploit the over-the-air measurements to localize all the defective IRS elements quickly and accurately.
Abstract:Holographic MIMO communications, enabled by large-scale antenna arrays with quasi-continuous apertures, is a potential technology for spectrum efficiency improvement. However, the increased antenna aperture size extends the range of the Fresnel region, leading to a hybrid near-far field communication mode. The users and scatterers randomly lie in near-field and far-field zones, and thus, conventional far-field-only and near-field-only channel estimation methods may not work. To tackle this challenge, we demonstrate the existence of the power diffusion (PD) effect, which leads to a mismatch between the hybrid-field channel and existing channel estimation methods. Specifically, in far-field and near-field transform domains, the power gain of one channel path may diffuse to other positions, thus generating fake paths. This renders the conventional techniques unable to detect those real paths. We propose a PD-aware orthogonal matching pursuit algorithm to eliminate the influence of the PD effect by identifying the PD range within which paths diffuse to other positions. PD-OMP fits a general case without prior knowledge of near-field and far-field path numbers and the user's location. The computational complexity of PD-OMP and the Cramer-Rao Lower Bound for the sparse-signal-recovery-based channel estimation are also derived. Simulation results show that PD-OMP outperforms state-of-the-art hybrid-field channel estimation methods.