Abstract:5G technology enhances industries with high-speed, reliable, low-latency communication, revolutionizing mobile broadband and supporting massive IoT connectivity. With the increasing complexity of applications on User Equipment (UE), offloading resource-intensive tasks to robust servers is essential for improving latency and speed. The 3GPP's Multi-access Edge Computing (MEC) framework addresses this challenge by processing tasks closer to the user, highlighting the need for an intelligent controller to optimize task offloading and resource allocation. This paper introduces a novel methodology to efficiently allocate both communication and computational resources among individual UEs. Our approach integrates two critical 5G service imperatives: Ultra-Reliable Low Latency Communication (URLLC) and Massive Machine Type Communication (mMTC), embedding them into the decision-making framework. Central to this approach is the utilization of Proximal Policy Optimization, providing a robust and efficient solution to the challenges posed by the evolving landscape of 5G technology. The proposed model is evaluated in a simulated 5G MEC environment. The model significantly reduces processing time by 4% for URLLC users under strict latency constraints and decreases power consumption by 26% for mMTC users, compared to existing baseline models based on the reported simulation results. These improvements showcase the model's adaptability and superior performance in meeting diverse QoS requirements in 5G networks.
Abstract:DL-based automatic modulation classification (AMC) models are highly susceptible to adversarial attacks, where even minimal input perturbations can cause severe misclassifications. While adversarially training an AMC model based on an adversarial attack significantly increases its robustness against that attack, the AMC model will still be defenseless against other adversarial attacks. The theoretically infinite possibilities for adversarial perturbations mean that an AMC model will inevitably encounter new unseen adversarial attacks if it is ever to be deployed to a real-world communication system. Moreover, the computational limitations and challenges of obtaining new data in real-time will not allow a full training process for the AMC model to adapt to the new attack when it is online. To this end, we propose a meta-learning-based adversarial training framework for AMC models that substantially enhances robustness against unseen adversarial attacks and enables fast adaptation to these attacks using just a few new training samples, if any are available. Our results demonstrate that this training framework provides superior robustness and accuracy with much less online training time than conventional adversarial training of AMC models, making it highly efficient for real-world deployment.
Abstract:The increasing accessibility of radiometric thermal imaging sensors for unmanned aerial vehicles (UAVs) offers significant potential for advancing AI-driven aerial wildfire management. Radiometric imaging provides per-pixel temperature estimates, a valuable improvement over non-radiometric data that requires irradiance measurements to be converted into visible images using RGB color palettes. Despite its benefits, this technology has been underutilized largely due to a lack of available data for researchers. This study addresses this gap by introducing methods for collecting and processing synchronized visual spectrum and radiometric thermal imagery using UAVs at prescribed fires. The included imagery processing pipeline drastically simplifies and partially automates each step from data collection to neural network input. Further, we present the FLAME 3 dataset, the first comprehensive collection of side-by-side visual spectrum and radiometric thermal imagery of wildland fires. Building on our previous FLAME 1 and FLAME 2 datasets, FLAME 3 includes radiometric thermal Tag Image File Format (TIFFs) and nadir thermal plots, providing a new data type and collection method. This dataset aims to spur a new generation of machine learning models utilizing radiometric thermal imagery, potentially trivializing tasks such as aerial wildfire detection, segmentation, and assessment. A single-burn subset of FLAME 3 for computer vision applications is available on Kaggle with the full 6 burn set available to readers upon request.
Abstract:Pre-trained Vision-language (VL) models, such as CLIP, have shown significant generalization ability to downstream tasks, even with minimal fine-tuning. While prompt learning has emerged as an effective strategy to adapt pre-trained VL models for downstream tasks, current approaches frequently encounter severe overfitting to specific downstream data distributions. This overfitting constrains the original behavior of the VL models to generalize to new domains or unseen classes, posing a critical challenge in enhancing the adaptability and generalization of VL models. To address this limitation, we propose Style-Pro, a novel style-guided prompt learning framework that mitigates overfitting and preserves the zero-shot generalization capabilities of CLIP. Style-Pro employs learnable style bases to synthesize diverse distribution shifts, guided by two specialized loss functions that ensure style diversity and content integrity. Then, to minimize discrepancies between unseen domains and the source domain, Style-Pro maps the unseen styles into the known style representation space as a weighted combination of style bases. Moreover, to maintain consistency between the style-shifted prompted model and the original frozen CLIP, Style-Pro introduces consistency constraints to preserve alignment in the learned embeddings, minimizing deviation during adaptation to downstream tasks. Extensive experiments across 11 benchmark datasets demonstrate the effectiveness of Style-Pro, consistently surpassing state-of-the-art methods in various settings, including base-to-new generalization, cross-dataset transfer, and domain generalization.
Abstract:Recent advancements in anomaly detection have shifted focus towards Multi-class Unified Anomaly Detection (MUAD), offering more scalable and practical alternatives compared to traditional one-class-one-model approaches. However, existing MUAD methods often suffer from inter-class interference and are highly susceptible to domain shifts, leading to substantial performance degradation in real-world applications. In this paper, we propose a novel robust prompt-driven MUAD framework, called ROADS, to address these challenges. ROADS employs a hierarchical class-aware prompt integration mechanism that dynamically encodes class-specific information into our anomaly detector to mitigate interference among anomaly classes. Additionally, ROADS incorporates a domain adapter to enhance robustness against domain shifts by learning domain-invariant representations. Extensive experiments on MVTec-AD and VISA datasets demonstrate that ROADS surpasses state-of-the-art methods in both anomaly detection and localization, with notable improvements in out-of-distribution settings.
Abstract:Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN's overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks is able to replace traditional video transmission to enhance video quality and transmission efficiency. However, many models and chunks are needed to guarantee high performance, which leads to tremendous overhead on model switching and memory footprints at the user end. To resolve such problems, we propose a Dynamic Deep neural network assisted by a Content-Aware data processing pipeline to reduce the model number down to one (Dy-DCA), which helps promote performance while conserving computational resources. Additionally, to achieve real acceleration on the user end, we designed a framework that optimizes dynamic features (e.g., dynamic shapes, sizes, and control flow) in Dy-DCA to enable a series of compilation optimizations, including fused code generation, static execution planning, etc. By employing such techniques, our method achieves better PSNR and real-time performance (33 FPS) on an off-the-shelf mobile phone. Meanwhile, assisted by our compilation optimization, we achieve a 1.7$\times$ speedup while saving up to 1.61$\times$ memory consumption. Code available in https://github.com/coulsonlee/Dy-DCA-ECCV2024.
Abstract:FlameFinder is a deep metric learning (DML) framework designed to accurately detect flames, even when obscured by smoke, using thermal images from firefighter drones during wildfire monitoring. Traditional RGB cameras struggle in such conditions, but thermal cameras can capture smoke-obscured flame features. However, they lack absolute thermal reference points, leading to false positives.To address this issue, FlameFinder utilizes paired thermal-RGB images for training. By learning latent flame features from smoke-free samples, the model becomes less biased towards relative thermal gradients. In testing, it identifies flames in smoky patches by analyzing their equivalent thermal-domain distribution. This method improves performance using both supervised and distance-based clustering metrics.The framework incorporates a flame segmentation method and a DML-aided detection framework. This includes utilizing center loss (CL), triplet center loss (TCL), and triplet cosine center loss (TCCL) to identify optimal cluster representatives for classification. However, the dominance of center loss over the other losses leads to the model missing features sensitive to them. To address this limitation, an attention mechanism is proposed. This mechanism allows for non-uniform feature contribution, amplifying the critical role of cosine and triplet loss in the DML framework. Additionally, it improves interpretability, class discrimination, and decreases intra-class variance. As a result, the proposed model surpasses the baseline by 4.4% in the FLAME2 dataset and 7% in the FLAME3 dataset for unobscured flame detection accuracy. Moreover, it demonstrates enhanced class separation in obscured scenarios compared to VGG19, ResNet18, and three backbone models tailored for flame detection.
Abstract:Motivated by agility, 3D mobility, and low-risk operation compared to human-operated management systems of autonomous unmanned aerial vehicles (UAVs), this work studies UAV-based active wildfire monitoring where a UAV detects fire incidents in remote areas and tracks the fire frontline. A UAV path planning solution is proposed considering realistic wildfire management missions, where a single low-altitude drone with limited power and flight time is available. Noting the limited field of view of commercial low-altitude UAVs, the problem formulates as a partially observable Markov decision process (POMDP), in which wildfire progression outside the field of view causes inaccurate state representation that prevents the UAV from finding the optimal path to track the fire front in limited time. Common deep reinforcement learning (DRL)-based trajectory planning solutions require diverse drone-recorded wildfire data to generalize pre-trained models to real-time systems, which is not currently available at a diverse and standard scale. To narrow down the gap caused by partial observability in the space of possible policies, a belief-based state representation with broad, extensive simulated data is proposed where the beliefs (i.e., ignition probabilities of different grid areas) are updated using a Bayesian framework for the cells within the field of view. The performance of the proposed solution in terms of the ratio of detected fire cells and monitored ignited area (MIA) is evaluated in a complex fire scenario with multiple rapidly growing fire batches, indicating that the belief state representation outperforms the observation state representation both in fire coverage and the distance to fire frontline.
Abstract:Remote patient monitoring based on wearable single-lead electrocardiogram (ECG) devices has significant potential for enabling the early detection of heart disease, especially in combination with artificial intelligence (AI) approaches for automated heart disease detection. There have been prior studies applying AI approaches based on deep learning for heart disease detection. However, these models are yet to be widely accepted as a reliable aid for clinical diagnostics, in part due to the current black-box perception surrounding many AI algorithms. In particular, there is a need to identify the key features of the ECG signal that contribute toward making an accurate diagnosis, thereby enhancing the interpretability of the model. In the present study, we develop a vision transformer approach to identify atrial fibrillation based on single-lead ECG data. A residual network (ResNet) approach is also developed for comparison with the vision transformer approach. These models are applied to the Chapman-Shaoxing dataset to classify atrial fibrillation, as well as another common arrhythmia, sinus bradycardia, and normal sinus rhythm heartbeats. The models enable the identification of the key regions of the heartbeat that determine the resulting classification, and highlight the importance of P-waves and T-waves, as well as heartbeat duration and signal amplitude, in distinguishing normal sinus rhythm from atrial fibrillation and sinus bradycardia.
Abstract:Unmanned aerial vehicles (UAVs) offer a flexible and cost-effective solution for wildfire monitoring. However, their widespread deployment during wildfires has been hindered by a lack of operational guidelines and concerns about potential interference with aircraft systems. Consequently, the progress in developing deep-learning models for wildfire detection and characterization using aerial images is constrained by the limited availability, size, and quality of existing datasets. This paper introduces a solution aimed at enhancing the quality of current aerial wildfire datasets to align with advancements in camera technology. The proposed approach offers a solution to create a comprehensive, standardized large-scale image dataset. This paper presents a pipeline based on CycleGAN to enhance wildfire datasets and a novel fusion method that integrates paired RGB images as attribute conditioning in the generators of both directions, improving the accuracy of the generated images.