Abstract:Out-of-distribution (OOD) detection methods have recently become more prominent, serving as a core element in safety-critical autonomous systems. One major purpose of OOD detection is to reject invalid inputs that could lead to unpredictable errors and compromise safety. Due to the cost of labeled data, recent works have investigated the feasibility of self-supervised learning (SSL) OOD detection, unlabeled OOD detection, and zero shot OOD detection. In this work, we identify a set of conditions for a theoretical guarantee of failure in unlabeled OOD detection algorithms from an information-theoretic perspective. These conditions are present in all OOD tasks dealing with real-world data: I) we provide theoretical proof of unlabeled OOD detection failure when there exists zero mutual information between the learning objective and the in-distribution labels, a.k.a. 'label blindness', II) we define a new OOD task - Adjacent OOD detection - that tests for label blindness and accounts for a previously ignored safety gap in all OOD detection benchmarks, and III) we perform experiments demonstrating that existing unlabeled OOD methods fail under conditions suggested by our label blindness theory and analyze the implications for future research in unlabeled OOD methods.
Abstract:For large-scale multi-gateway LoRaWAN networks, this study proposes a cloud-edge collaborative resource allocation and decision-making method based on edge intelligence, HEAT-LDL (HEAT-Local Distill Lyapunov), which realizes collaborative decision-making between gateways and terminal nodes. HEAT-LDL combines the Actor-Critic architecture and the Lyapunov optimization method to achieve intelligent downlink control and gateway load balancing. When the signal quality is good, the network server uses the HEAT algorithm to schedule the terminal nodes. To improve the efficiency of autonomous decision-making of terminal nodes, HEAT-LDL performs cloud-edge knowledge distillation on the HEAT teacher model on the terminal node side. When the downlink decision instruction is lost, the terminal node uses the student model and the edge decider based on prior knowledge and local history to make collaborative autonomous decisions. Simulation experiments show that compared with the optimal results of all compared algorithms, HEAT-LDL improves the packet success rate and energy efficiency by 20.5% and 88.1%, respectively.
Abstract:For a single-gateway LoRaWAN network, this study proposed a history-enhanced two-phase actor-critic algorithm with a shared transformer algorithm (HEAT) to improve network performance. HEAT considers uplink parameters and often neglected downlink parameters, and effectively integrates offline and online reinforcement learning, using historical data and real-time interaction to improve model performance. In addition, this study developed an open source LoRaWAN network simulator LoRaWANSim. The simulator considers the demodulator lock effect and supports multi-channel, multi-demodulator and bidirectional communication. Simulation experiments show that compared with the best results of all compared algorithms, HEAT improves the packet success rate and energy efficiency by 15% and 95%, respectively.
Abstract:This paper introduces a new video question-answering (VQA) dataset that challenges models to leverage procedural knowledge for complex reasoning. It requires recognizing visual entities, generating hypotheses, and performing contextual, causal, and counterfactual reasoning. To address this, we propose neuro symbolic reasoning module that integrates neural networks and LLM-driven constrained reasoning over variables for interpretable answer generation. Results show that combining LLMs with structured knowledge reasoning with logic enhances procedural reasoning on the STAR benchmark and our dataset. Code and dataset at https://github.com/LUNAProject22/KML soon.
Abstract:This study employed over 100 hours of high-altitude drone video data from eight intersections in Hohhot to generate a unique and extensive dataset encompassing high-density urban road intersections in China. This research has enhanced the YOLOUAV model to enable precise target recognition on unmanned aerial vehicle (UAV) datasets. An automated calibration algorithm is presented to create a functional dataset in high-density traffic flows, which saves human and material resources. This algorithm can capture up to 200 vehicles per frame while accurately tracking over 1 million road users, including cars, buses, and trucks. Moreover, the dataset has recorded over 50,000 complete lane changes. It is the largest publicly available road user trajectories in high-density urban intersections. Furthermore, this paper updates speed and acceleration algorithms based on UAV elevation and implements a UAV offset correction algorithm. A case study demonstrates the usefulness of the proposed methods, showing essential parameters to evaluate intersections and traffic conditions in traffic engineering. The model can track more than 200 vehicles of different types simultaneously in highly dense traffic on an urban intersection in Hohhot, generating heatmaps based on spatial-temporal traffic flow data and locating traffic conflicts by conducting lane change analysis and surrogate measures. With the diverse data and high accuracy of results, this study aims to advance research and development of UAVs in transportation significantly. The High-Density Intersection Dataset is available for download at https://github.com/Qpu523/High-density-Intersection-Dataset.
Abstract:We address the design and optimization of real-world-suitable hybrid precoders for multi-user wideband sub-terahertz (sub-THz) communications. We note that the conventional fully connected true-time delay (TTD)-based architecture is impractical because there is no room for the required large number of analog signal combiners in the circuit board. Additionally, analog signal combiners incur significant signal power loss. These limitations are often overlooked in sub-THz research. To overcome these issues, we study a non-overlapping subarray architecture that eliminates the need for analog combiners. We extend the conventional single-user assumption by formulating an optimization problem to maximize the minimum data rate for simultaneously served users. This complex optimization problem is divided into two sub-problems. The first sub-problem aims to ensure a fair subarray allocation for all users and is solved via a continuous domain relaxation technique. The second sub-problem deals with practical TTD device constraints on range and resolution to maximize the subarray gain and is resolved by shifting to the phase domain. Our simulation results highlight significant performance gain for our real-world-ready TTD-based hybrid precoders.
Abstract:We develop a graph neural network (GNN) to compute, within a time budget of 1 to 2 milliseconds required by practical systems, the optimal linear precoder (OLP) maximizing the minimal downlink user data rate for a Cell-Free Massive MIMO system - a key 6G wireless technology. The state-of-the-art method is a bisection search on second order cone programming feasibility test (B-SOCP) which is a magnitude too slow for practical systems. Our approach relies on representing OLP as a node-level prediction task on a graph. We construct a graph that accurately captures the interdependence relation between access points (APs) and user equipments (UEs), and the permutation equivariance of the Max-Min problem. Our neural network, named OLP-GNN, is trained on data obtained by B-SOCP. We tailor the OLP-GNN size, together with several artful data preprocessing and postprocessing methods to meet the runtime requirement. We show by extensive simulations that it achieves near optimal spectral efficiency in a range of scenarios with different number of APs and UEs, and for both line-of-sight and non-line-of-sight radio propagation environments.
Abstract:With the advent of image super-resolution (SR) algorithms, how to evaluate the quality of generated SR images has become an urgent task. Although full-reference methods perform well in SR image quality assessment (SR-IQA), their reliance on high-resolution (HR) images limits their practical applicability. Leveraging available reconstruction information as much as possible for SR-IQA, such as low-resolution (LR) images and the scale factors, is a promising way to enhance assessment performance for SR-IQA without HR for reference. In this letter, we attempt to evaluate the perceptual quality and reconstruction fidelity of SR images considering LR images and scale factors. Specifically, we propose a novel dual-branch reduced-reference SR-IQA network, \ie, Perception- and Fidelity-aware SR-IQA (PFIQA). The perception-aware branch evaluates the perceptual quality of SR images by leveraging the merits of global modeling of Vision Transformer (ViT) and local relation of ResNet, and incorporating the scale factor to enable comprehensive visual perception. Meanwhile, the fidelity-aware branch assesses the reconstruction fidelity between LR and SR images through their visual perception. The combination of the two branches substantially aligns with the human visual system, enabling a comprehensive SR image evaluation. Experimental results indicate that our PFIQA outperforms current state-of-the-art models across three widely-used SR-IQA benchmarks. Notably, PFIQA excels in assessing the quality of real-world SR images.
Abstract:This paper focuses on a real-time vehicle detection and urban traffic behavior analysis system based on Unmanned Aerial Vehicle (UAV) traffic video. By using UAV to collect traffic data and combining the YOLOv8 model and SORT tracking algorithm, the object detection and tracking functions are implemented on the iOS mobile platform. For the problem of traffic data acquisition and analysis, the dynamic computing method is used to process the performance in real time and calculate the micro and macro traffic parameters of the vehicles, and real-time traffic behavior analysis is conducted and visualized. The experiment results reveals that the vehicle object detection can reach 98.27% precision rate and 87.93% recall rate, and the real-time processing capacity is stable at 30 frames per seconds. This work integrates drone technology, iOS development, and deep learning techniques to integrate traffic video acquisition, object detection, object tracking, and traffic behavior analysis functions on mobile devices. It provides new possibilities for lightweight traffic information collection and data analysis, and offers innovative solutions to improve the efficiency of analyzing road traffic conditions and addressing transportation issues for transportation authorities.
Abstract:Heterogeneous graph neural architecture search (HGNAS) represents a powerful tool for automatically designing effective heterogeneous graph neural networks. However, existing HGNAS algorithms suffer from inefficient searches and unstable results. In this paper, we present a new GPT-4 based HGNAS model to improve the search efficiency and search accuracy of HGNAS. Specifically, we present a new GPT-4 enhanced Heterogeneous Graph Neural Architecture Search (GHGNAS for short). The basic idea of GHGNAS is to design a set of prompts that can guide GPT-4 toward the task of generating new heterogeneous graph neural architectures. By iteratively asking GPT-4 with the prompts, GHGNAS continually validates the accuracy of the generated HGNNs and uses the feedback to further optimize the prompts. Experimental results show that GHGNAS can design new HGNNs by leveraging the powerful generalization capability of GPT-4. Moreover, GHGNAS runs more effectively and stably than previous HGNAS models based on reinforcement learning and differentiable search algorithms.