Abstract:High-precision direction-of-arrival (DOA) estimation, as a key sensing capability for 6G-enabled applications such as autonomous driving and extended reality, is increasingly dependent on the effective exploitation of spatial degrees of freedom (DOFs). This paper integrates two frontier DOFs-oriented paradigms and proposes a fluid antenna-enabled hybrid analog-digital (FA-HAD) architecture, which features an extremely lightweight front-end configuration mechanism and efficient spatial DOFs exploitation. Within this architecture, a collaborative spatial-phase sampling strategy is first developed to enable real-time 2-D DOA estimation under compressive observations, and a single-source CRLB analysis is provided to quantify the achievable performance limit, offering quantitative guidance for accuracy-overhead trade-offs. Furthermore, an efficient virtual-array spatial covariance matrix reconstruction method is proposed to recover a physically meaningful covariance representation, thereby providing a covariance-domain interface that is directly reusable by a broad class of existing covariance-based array processing and array design techniques, which strengthens the scalability and transferability of the proposed architecture. Building upon the reconstructed SCM, a Jacobi-Anger expansion based dimension-reduced MUSIC estimator is further derived for arbitrary planar arrays with a favorable computational cost. Simulation results demonstrate that the proposed FA-HAD framework attains DOA accuracy close to fully digital systems while substantially reducing RF hardware complexity and training overhead.
Abstract:LiDAR semantic segmentation plays a pivotal role in 3D scene understanding for edge applications such as autonomous driving. However, significant challenges remain for real-world deployments, particularly for on-device post-deployment adaptation. Real-world environments can shift as the system navigates through different locations, leading to substantial performance degradation without effective and timely model adaptation. Furthermore, edge systems operate under strict computational and energy constraints, making it infeasible to adapt conventional segmentation models (based on large neural networks) directly on-device. To address the above challenges, we introduce HyperLiDAR, the first lightweight, post-deployment LiDAR segmentation framework based on Hyperdimensional Computing (HDC). The design of HyperLiDAR fully leverages the fast learning and high efficiency of HDC, inspired by how the human brain processes information. To further improve the adaptation efficiency, we identify the high data volume per scan as a key bottleneck and introduce a buffer selection strategy that focuses learning on the most informative points. We conduct extensive evaluations on two state-of-the-art LiDAR segmentation benchmarks and two representative devices. Our results show that HyperLiDAR outperforms or achieves comparable adaptation performance to state-of-the-art segmentation methods, while achieving up to a 13.8x speedup in retraining.
Abstract:With the rapid expansion of low-altitude economy (LAE) services and the growing demand for integrated sensing and communication (ISAC) in air-ground networks, reliable direction-of-arrival (DOA) estimation has become essential for both directional communication and sensing functions. DOA underpins beam alignment, spatial-reuse scheduling, and ISAC-critical tasks such as airspace situational awareness and multi-target monitoring. Hybrid analog-digital (HAD) architectures have emerged as a practical solution for large-aperture directional operation under stringent radio frequency (RF), analog-to-digital converter (ADC), and size, weight, and power (SWaP) constraints. However, HAD compresses antenna-domain observations through analog combining, fundamentally reshaping the measurement model and introducing new algorithmic and system-level challenges for DOA estimation. This article first reviews the principles and representative architectures of HAD, highlighting their advantages for scalable beam-centric and ISAC-oriented operation in LAE scenarios. We then provide a structured overview of HAD-enabled DOA estimation methodologies, including spatial covariance matrix (SCM) reconstruction, multi-combiner scan-based acquisition, and pilot-aided estimation, along with key design tradeoffs. Finally, we discuss open challenges and outline reliability-driven research directions toward robust, deployable HAD-enabled DOA solutions for practical ISAC-enabled low-altitude environments.
Abstract:Unlike fixed-position arrays with static observation entropy, the scalable fluid antenna system (S-FAS) can dynamically adjust its aperture to form different observation spaces with configuration-dependent entropy budgets. This reconfigurability requires an information-theoretic framework beyond traditional algebraic identifiability analysis. This paper establishes an observation entropy framework for S-FAS, which unifies the derivation of identifiability limits, the diagnosis of processing bottlenecks, and system design optimization. For an S-FAS with mutual coupling suppression, we derive a complete capacity hierarchy among compressed, extended, and jointly stacked configurations. The entropy framework reveals that sequential two-stage processing suffers from an information bottleneck that restricts achievable capacity, while the noise entropy ratio can be used to distinguish fundamental performance limits from algorithmic deficiencies. A joint MUSIC algorithm is proposed to approach the theoretical joint capacity bound. Extensive Monte Carlo simulations, validated by both algebraic and information-theoretic criteria, verify the derived capacity hierarchy and identifiability boundaries.
Abstract:Autonomous driving requires reliable reasoning over fine-grained 3D scene facts. Fine-grained question answering over multi-modal driving observations provides a natural way to evaluate this capability, yet existing perception pipelines and driving-oriented large language model (LLM) methods still suffer from unreliable scene facts, hallucinations, opaque reasoning, and heavy reliance on task-specific training. We present KLDrive, the first knowledge-graph-augmented LLM reasoning framework for fine-grained question answering in autonomous driving. KLDrive addresses this problem through designing two tightly coupled components: an energy-based scene fact construction module that consolidates multi-source evidence into a reliable scene knowledge graph, and an LLM agent that performs fact-grounded reasoning over a constrained action space under explicit structural constraints. By combining structured prompting with few-shot in-context exemplars, the framework adapts to diverse reasoning tasks without heavy task-specific fine-tuning. Experiments on two large-scale autonomous-driving QA benchmarks show that KLDrive outperforms prior state-of-the-art methods, achieving the best overall accuracy of 65.04% on NuScenes-QA and the best SPICE score of 42.45 on GVQA. On counting, the most challenging factual reasoning task, it improves over the strongest baseline by 46.01 percentage points, demonstrating substantially reduced hallucinations and the benefit of coupling reliable scene fact construction with explicit reasoning.
Abstract:Large-scale sparse multi-objective optimization problems (LSMOPs) are prevalent in real-world applications, where optimal solutions typically contain only a few nonzero variables, such as in adversarial attacks, critical node detection, and sparse signal reconstruction. Since the function evaluation of LSMOPs often relies on large-scale datasets involving a large number of decision variables, the search space becomes extremely high-dimensional. The coexistence of sparsity and high dimensionality greatly intensifies the conflict between exploration and exploitation, making it difficult for existing multi-objective evolutionary algorithms (MOEAs) to identify the critical nonzero decision variables within limited function evaluations. To address this challenge, this paper proposes an evolutionary algorithm with probabilistic annealing for large-scale sparse multi-objective optimization. The algorithm is driven by two probability vectors with distinct entropy characteristics: a convergence-oriented probability vector with relatively low entropy ensures stable exploitation, whereas an annealed probability vector with gradually decreasing entropy enables an adaptive transition from global exploration to local refinement. By integrating these complementary search dynamics, the proposed algorithm achieves a dynamic equilibrium between exploration and exploitation. Experimental results on benchmark problems and real-world applications demonstrate that the proposed algorithm outperforms state-of-the-art evolutionary algorithms in terms of both convergence and diversity.
Abstract:Chain-of-thought (CoT) improves reasoning reliability but increases token cost, motivating post-training compression of explicit reasoning traces. However, the shortest sufficient reasoning is not universal: it depends on difficulty, model capacity, and training state, making fixed length targets brittle. In practice, naive RL-based compression can also undesirably shorten the user-facing answer, because a single completion-level learning signal leaks across the think/answer boundary. We propose Difficulty-Scaled Segment-Wise GRPO (DSS-GRPO), which decomposes returns into think and answer components, computes group-relative advantages per segment, and routes them with hard token masks so compression updates act only on think while answer alignment acts only on answer. DSS-GRPO uses prompt-wise within-group shaping and difficulty-aware scaling to encourage concise reasoning without collapsing answer behavior.
Abstract:Speech production is a complex process spanning neural planning, motor control, muscle activation, and articulatory kinematics. While the acoustic speech signal is the most accessible product of the speech production act, it does not directly reveal its causal neurophysiological substrates. We present the first simultaneous acquisition of real-time (dynamic) MRI, EEG, and surface EMG, capturing several key aspects of the speech production chain: brain signals, muscle activations, and articulatory movements. This multimodal acquisition paradigm presents substantial technical challenges, including MRI-induced electromagnetic interference and myogenic artifacts. To mitigate these, we introduce an artifact suppression pipeline tailored to this tri-modal setting. Once fully developed, this framework is poised to offer an unprecedented window into speech neuroscience and insights leading to brain-computer interface advances.
Abstract:Simultaneous recording of electroencephalography (EEG) and functional MRI (fMRI) can provide a more complete view of brain function by merging high temporal and spatial resolutions. High-field ($\geq$3T) systems are standard, and require technical trade-offs, including artifacts in the EEG signal, reduced compatibility with metallic implants, high acoustic noise, and artifacts around high-susceptibility areas such as the optic nerve and nasal sinus. This proof-of-concept study demonstrates the feasibility of simultaneous EEG-fMRI at 0.55T in a visual task. We characterize the gradient and ballistocardiogram (BCG) artifacts inherent to this environment and observe reduced BCG magnitude consistent with the expected scaling of pulse-related artifacts with static magnetic field strength. This reduction shows promise for facilitating effective denoising while preserving the alpha rhythm and signal integrity. Furthermore, we tested a multimodal integration pipeline and demonstrated that the EEG power envelope corresponds with the hemodynamic BOLD response, supporting the potential to measure neurovascular coupling in this environment. We demonstrate that combined EEG-fMRI at 0.55T is feasible and represents a promising environment for multimodal neuroimaging.
Abstract:Positron emission tomography (PET) offers powerful functional imaging but involves radiation exposure. Efforts to reduce this exposure by lowering the radiotracer dose or scan time can degrade image quality. While using magnetic resonance (MR) images with clearer anatomical information to restore standard-dose PET (SPET) from low-dose PET (LPET) is a promising approach, it faces challenges with the inconsistencies in the structure and texture of multi-modality fusion, as well as the mismatch in out-of-distribution (OOD) data. In this paper, we propose a supervise-assisted multi-modality fusion diffusion model (MFdiff) for addressing these challenges for high-quality PET restoration. Firstly, to fully utilize auxiliary MR images without introducing extraneous details in the restored image, a multi-modality feature fusion module is designed to learn an optimized fusion feature. Secondly, using the fusion feature as an additional condition, high-quality SPET images are iteratively generated based on the diffusion model. Furthermore, we introduce a two-stage supervise-assisted learning strategy that harnesses both generalized priors from simulated in-distribution datasets and specific priors tailored to in-vivo OOD data. Experiments demonstrate that the proposed MFdiff effectively restores high-quality SPET images from multi-modality inputs and outperforms state-of-the-art methods both qualitatively and quantitatively.