Abstract:Recent advancements in UAV technology have spurred interest in developing multi-UAV aerial surveying systems for use in confined environments where GNSS signals are blocked or jammed. This paper focuses airborne magnetic surveying scenarios. To obtain clean magnetic measurements reflecting the Earth's magnetic field, the magnetic sensor must be isolated from other electronic devices, creating a significant localization challenge. We propose a visual cooperative localization solution. The solution incorporates a visual processing module and an improved manifold-based sensor fusion algorithm, delivering reliable and accurate positioning information. Real flight experiments validate the approach, demonstrating single-axis centimeter-level accuracy and decimeter-level overall 3D positioning accuracy.
Abstract:Large language models (LLMs) have demonstrated remarkable capabilities in generating human-like text and exhibiting personality traits similar to those in humans. However, the mechanisms by which LLMs encode and express traits such as agreeableness and impulsiveness remain poorly understood. Drawing on the theory of social determinism, we investigate how long-term background factors, such as family environment and cultural norms, interact with short-term pressures like external instructions, shaping and influencing LLMs' personality traits. By steering the output of LLMs through the utilization of interpretable features within the model, we explore how these background and pressure factors lead to changes in the model's traits without the need for further fine-tuning. Additionally, we suggest the potential impact of these factors on model safety from the perspective of personality.
Abstract:Large Language Models have demonstrated remarkable abilities across various tasks, with Chain-of-Thought (CoT) prompting emerging as a key technique to enhance reasoning capabilities. However, existing research primarily focuses on improving performance, lacking a comprehensive framework to explain and understand the fundamental factors behind CoT's success. To bridge this gap, we introduce a novel perspective grounded in the Hopfieldian view of cognition in cognitive neuroscience. We establish a connection between CoT reasoning and key cognitive elements such as stimuli, actions, neural populations, and representation spaces. From our view, we can understand the reasoning process as the movement between these representation spaces. Building on this insight, we develop a method for localizing reasoning errors in the response of CoTs. Moreover, we propose the Representation-of-Thought (RoT) framework, which leverages the robustness of low-dimensional representation spaces to enhance the robustness of the reasoning process in CoTs. Experimental results demonstrate that RoT improves the robustness and interpretability of CoT reasoning while offering fine-grained control over the reasoning process.
Abstract:LiDAR panoptic segmentation, which jointly performs instance and semantic segmentation for things and stuff classes, plays a fundamental role in LiDAR perception tasks. While most existing methods explicitly separate these two segmentation tasks and utilize different branches (i.e., semantic and instance branches), some recent methods have embraced the query-based paradigm to unify LiDAR panoptic segmentation. However, the distinct spatial distribution and inherent characteristics of objects(things) and their surroundings(stuff) in 3D scenes lead to challenges, including the mutual competition of things/stuff and the ambiguity of classification/segmentation. In this paper, we propose decoupling things/stuff queries according to their intrinsic properties for individual decoding and disentangling classification/segmentation to mitigate ambiguity. To this end, we propose a novel framework dubbed DQFormer to implement semantic and instance segmentation in a unified workflow. Specifically, we design a decoupled query generator to propose informative queries with semantics by localizing things/stuff positions and fusing multi-level BEV embeddings. Moreover, a query-oriented mask decoder is introduced to decode corresponding segmentation masks by performing masked cross-attention between queries and mask embeddings. Finally, the decoded masks are combined with the semantics of the queries to produce panoptic results. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the superiority of our DQFormer framework.
Abstract:This paper considers networked sensing in cellular network, where multiple base stations (BSs) first compress their received echo signals from multiple targets and then forward the quantized signals to the cloud via limited-capacity backhaul links, such that the cloud can leverage all useful echo signals to perform high-resolution localization. Under this setup, we manage to characterize the posterior Cramer-Rao Bound (PCRB) for localizing all the targets as a function of the transmit covariance matrix and the compression noise covariance matrix of each BS. Then, a PCRB minimization problem subject to the transmit power constraints and the backhaul capacity constraints is formulated to jointly design the BSs' transmission and compression strategies. We propose an efficient algorithm to solve this problem based on the alternating optimization technique. Specifically, it is shown that when either the transmit covariance matrices or the compression noise covariance matrices are fixed, the successive convex approximation technique can be leveraged to optimize the other type of covariance matrices locally. Numerical results are provided to verify the effectiveness of our proposed algorithm.
Abstract:Due to circuit failures, defective elements that cannot adaptively adjust the phase shifts of their impinging signals in a desired manner may exist on an intelligent reflecting surface (IRS). Traditional way to find these defective IRS elements requires a thorough diagnosis of all the circuits belonging to a huge number of IRS elements, which is practically challenging. In this paper, we will devise a novel approach under which a transmitter sends known pilot signals and a receiver localizes all the defective IRS elements just based on its over-the-air measurements reflected from the IRS. The key lies in the fact that the over-the-air measurements at the receiver side are functions of the set of defective IRS elements. Based on this observation, we propose a bisection based method to localize all the defective IRS elements. Specifically, at each time slot, we properly control the desired phase shifts of all the IRS elements such that half of the considered regime that is not useful to localize the defective elements can be found based on the received signals and removed. Via numerical results, it is shown that our proposed bisection method can exploit the over-the-air measurements to localize all the defective IRS elements quickly and accurately.
Abstract:Holographic MIMO communications, enabled by large-scale antenna arrays with quasi-continuous apertures, is a potential technology for spectrum efficiency improvement. However, the increased antenna aperture size extends the range of the Fresnel region, leading to a hybrid near-far field communication mode. The users and scatterers randomly lie in near-field and far-field zones, and thus, conventional far-field-only and near-field-only channel estimation methods may not work. To tackle this challenge, we demonstrate the existence of the power diffusion (PD) effect, which leads to a mismatch between the hybrid-field channel and existing channel estimation methods. Specifically, in far-field and near-field transform domains, the power gain of one channel path may diffuse to other positions, thus generating fake paths. This renders the conventional techniques unable to detect those real paths. We propose a PD-aware orthogonal matching pursuit algorithm to eliminate the influence of the PD effect by identifying the PD range within which paths diffuse to other positions. PD-OMP fits a general case without prior knowledge of near-field and far-field path numbers and the user's location. The computational complexity of PD-OMP and the Cramer-Rao Lower Bound for the sparse-signal-recovery-based channel estimation are also derived. Simulation results show that PD-OMP outperforms state-of-the-art hybrid-field channel estimation methods.
Abstract:Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models (LLMs). While some studies focus on improving CoT accuracy through methods like retrieval enhancement, yet a rigorous explanation for why CoT achieves such success remains unclear. In this paper, we analyze CoT methods under two different settings by asking the following questions: (1) For zero-shot CoT, why does prompting the model with "let's think step by step" significantly impact its outputs? (2) For few-shot CoT, why does providing examples before questioning the model could substantially improve its reasoning ability? To answer these questions, we conduct a top-down explainable analysis from the Hopfieldian view and propose a Read-and-Control approach for controlling the accuracy of CoT. Through extensive experiments on seven datasets for three different tasks, we demonstrate that our framework can decipher the inner workings of CoT, provide reasoning error localization, and control to come up with the correct reasoning path.
Abstract:Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness. Early research focused on fully fine-tuning RGB-based trackers, which was inefficient and lacked generalized representation due to the scarcity of multimodal data. Therefore, recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data. However, the modality gap limits pre-trained knowledge recall, and the dominance of the RGB modality persists, preventing the full utilization of information from other modalities. To address these issues, we propose a novel symmetric multimodal tracking framework called SDSTrack. We introduce lightweight adaptation for efficient fine-tuning, which directly transfers the feature extraction ability from RGB to other domains with a small number of trainable parameters and integrates multimodal features in a balanced, symmetric manner. Furthermore, we design a complementary masked patch distillation strategy to enhance the robustness of trackers in complex environments, such as extreme weather, poor imaging, and sensor failure. Extensive experiments demonstrate that SDSTrack outperforms state-of-the-art methods in various multimodal tracking scenarios, including RGB+Depth, RGB+Thermal, and RGB+Event tracking, and exhibits impressive results in extreme conditions. Our source code is available at https://github.com/hoqolo/SDSTrack.
Abstract:Integrated sensing and communication (ISAC) has recently attracted tremendous attention from both academia and industry, being envisioned as a key part of the standards for the sixth-generation (6G) cellular network. A key challenge of 6G-oriented ISAC lies in how to perform ubiquitous sensing based on the communication signals and devices. Previous works have made great progresses on studying the signal waveform design that leads to optimal communication-sensing performance tradeoff. In this article, we aim to focus on issues arising from the exploitation of the communication devices for sensing in 6G network. Particularly, we will discuss about how to leverage various nodes available in the cellular network as anchors to perform ubiquitous sensing. On one hand, the base stations (BSs) will be the most important anchors in the future 6G ISAC network, since they can generate/process radio signals with high range/angle resolutions, and their positions are precisely known. Correspondingly, we will first study the BS-based sensing technique. On the other hand, the BSs alone may not enable ubiquitous sensing, since they cannot cover all the places with strong line-of-sight (LOS) links. This motivates us to investigate the possibility of using other nodes that are with higher density in the network to act as the anchors. Along this line, we are interested in two types of new anchors - user equipments (UEs) and reconfigurable intelligent surfaces (RISs). This paper will shed light on the opportunities and challenges brought by UE-assisted sensing and RIS-assisted sensing. Our goal is to devise a novel 6G-oriented sensing architecture where BSs, UEs, and RISs can work together to provide ubiquitous sensing services.