Abstract:While Unmanned Aerial Vehicles (UAVs) have gained significant traction across various fields, path planning in 3D environments remains a critical challenge, particularly under size, weight, and power (SWAP) constraints. Traditional modular planning systems often introduce latency and suboptimal performance due to limited information sharing and local minima issues. End-to-end learning approaches streamline the pipeline by mapping sensory observations directly to actions but require large-scale datasets, face significant sim-to-real gaps, or lack dynamical feasibility. In this paper, we propose a self-supervised UAV trajectory planning pipeline that integrates a learning-based depth perception with differentiable trajectory optimization. A 3D cost map guides UAV behavior without expert demonstrations or human labels. Additionally, we incorporate a neural network-based time allocation strategy to improve the efficiency and optimality. The system thus combines robust learning-based perception with reliable physics-based optimization for improved generalizability and interpretability. Both simulation and real-world experiments validate our approach across various environments, demonstrating its effectiveness and robustness. Our method achieves a 31.33% improvement in position tracking error and 49.37% reduction in control effort compared to the state-of-the-art.
Abstract:With the rapid development of large multimodal models (LMMs), multimodal understanding applications are emerging. As most LMM inference requests originate from edge devices with limited computational capabilities, the predominant inference pipeline involves directly forwarding the input data to an edge server which handles all computations. However, this approach introduces high transmission latency due to limited uplink bandwidth of edge devices and significant computation latency caused by the prohibitive number of visual tokens, thus hindering delay-sensitive tasks and degrading user experience. To address this challenge, we propose a task-oriented feature compression (TOFC) method for multimodal understanding in a device-edge co-inference framework, where visual features are merged by clustering and encoded by a learnable and selective entropy model before feature projection. Specifically, we employ density peaks clustering based on K nearest neighbors to reduce the number of visual features, thereby minimizing both data transmission and computational complexity. Subsequently, a learnable entropy model with hyperprior is utilized to encode and decode merged features, further reducing transmission overhead. To enhance compression efficiency, multiple entropy models are adaptively selected based on the characteristics of the visual features, enabling a more accurate estimation of the probability distribution. Comprehensive experiments on seven visual question answering benchmarks validate the effectiveness of the proposed TOFC method. Results show that TOFC achieves up to 60% reduction in data transmission overhead and 50% reduction in system latency while maintaining identical task performance, compared with traditional image compression methods.
Abstract:This study evaluates the performance of Vehicle-to-Vehicle Visible Light Communication in dynamic environments, focusing on the effects of speed, horizontal offset, and other factors on communication reliability. Using On-Off Keying modulation, we analyze the BER, optimal communication distance, correlation time and the maximum amount of data per communication. Our results demonstrate that maintaining an optimal vehicle distance is critical for stable communication, with speed and horizontal offset significantly influencing communication. This work extends the analysis of V-VLC to real-world dynamic scenarios, providing insights for future research.
Abstract:We introduce a novel received signal strength intensity (RSSI)-based positioning method using fluid antenna systems (FAS), leveraging their inherent channel correlation properties to improve location accuracy. By enabling a single antenna to sample multiple spatial positions, FAS exhibits high correlation between its ports. We integrate this high inter-port correlation with a logarithmic path loss model to mitigate the impact of fast fading on RSSI signals, and derive a simplified multipoint positioning model based on the established relationship between channel correlation and RSSI signal correlation. A maximum likelihood estimator (MLE) is then developed, for which we provide a closed-form solution. Results demonstrate that our approach outperforms both traditional least squares (LS) methods and single-antenna systems, achieving accuracy comparable to conventional multi-antenna positioning. Furthermore, we analyze the impact of different antenna structures on positioning performance, offering practical guidance for FAS antenna design.
Abstract:This paper presents a novel indoor positioning approach that leverages antenna radiation pattern characteristics through Received Signal Strength Indication (RSSI) measurements in a single-antenna system. By rotating the antenna or reconfiguring its radiation pattern, we derive a maximum likelihood estimation (MLE) algorithm that achieves near-optimal positioning accuracy approaching the Cramer-Rao lower bound (CRLB). Through theoretical analysis, we establish three fundamental theorems characterizing the estimation accuracy bounds and demonstrating how performance improves with increased signal-to-noise ratio, antenna rotation count, and radiation pattern variations. Additionally, we propose a two-position measurement strategy that eliminates dependence on receiving antenna patterns. Simulation results validate that our approach provides an effective solution for indoor robot tracking applications where both accuracy and system simplicity are essential considerations.
Abstract:In this paper, we investigate unmanned aerial vehicle (UAV) assisted communication systems that require quasi-balanced data rates in uplink (UL) and downlink (DL), as well as users' heterogeneous traffic. To the best of our knowledge, this is the first work to explicitly investigate joint UL-DL optimization for UAV assisted systems under heterogeneous requirements. A hybrid-mode multiple access (HMMA) scheme is proposed toward heterogeneous traffic, where non-orthogonal multiple access (NOMA) targets high average data rate, while orthogonal multiple access (OMA) aims to meet users' instantaneous rate demands by compensating for their rates. HMMA enables a higher degree of freedom in multiple access and achieves a superior minimum average rate among users than the UAV assisted NOMA or OMA schemes. Under HMMA, a joint UL-DL resource allocation algorithm is proposed with a closed-form optimal solution for UL/DL power allocation to achieve quasi-balanced average rates for UL and DL. Furthermore, considering the error propagation in successive interference cancellation (SIC) of NOMA, an enhanced-HMMA scheme is proposed, which demonstrates high robustness against SIC error and a higher minimum average rate than the HMMA scheme.