Sherman
Abstract:The rapid expansion of data from diverse sources has made anomaly detection (AD) increasingly essential for identifying unexpected observations that may signal system failures, security breaches, or fraud. As datasets become more complex and high-dimensional, traditional detection methods struggle to effectively capture intricate patterns. Advances in deep learning have made AD methods more powerful and adaptable, improving their ability to handle high-dimensional and unstructured data. This survey provides a comprehensive review of over 180 recent studies, focusing on deep learning-based AD techniques. We categorize and analyze these methods into reconstruction-based and prediction-based approaches, highlighting their effectiveness in modeling complex data distributions. Additionally, we explore the integration of traditional and deep learning methods, highlighting how hybrid approaches combine the interpretability of traditional techniques with the flexibility of deep learning to enhance detection accuracy and model transparency. Finally, we identify open issues and propose future research directions to advance the field of AD. This review bridges gaps in existing literature and serves as a valuable resource for researchers and practitioners seeking to enhance AD techniques using deep learning.
Abstract:The trackers based on lightweight neural networks have achieved great success in the field of aerial remote sensing, most of which aggregate multi-stage deep features to lift the tracking quality. However, existing algorithms usually only generate single-stage fusion features for state decision, which ignore that diverse kinds of features are required for identifying and locating the object, limiting the robustness and precision of tracking. In this paper, we propose a novel target-aware Bidirectional Fusion transformer (BFTrans) for UAV tracking. Specifically, we first present a two-stream fusion network based on linear self and cross attentions, which can combine the shallow and the deep features from both forward and backward directions, providing the adjusted local details for location and global semantics for recognition. Besides, a target-aware positional encoding strategy is designed for the above fusion model, which is helpful to perceive the object-related attributes during the fusion phase. Finally, the proposed method is evaluated on several popular UAV benchmarks, including UAV-123, UAV20L and UAVTrack112. Massive experimental results demonstrate that our approach can exceed other state-of-the-art trackers and run with an average speed of 30.5 FPS on embedded platform, which is appropriate for practical drone deployments.
Abstract:Generative AI (GenAI) is driving the intelligence of wireless communications. Due to data limitations, random generation, and dynamic environments, GenAI may generate channel information or optimization strategies that violate physical laws or deviate from actual real-world requirements. We refer to this phenomenon as wireless hallucination, which results in invalid channel information, spectrum wastage, and low communication reliability but remains underexplored. To address this gap, this article provides a comprehensive concept of wireless hallucinations in GenAI-driven communications, focusing on hallucination mitigation. Specifically, we first introduce the fundamental, analyze its causes based on the GenAI workflow, and propose mitigation solutions at the data, model, and post-generation levels. Then, we systematically examines representative hallucination scenarios in GenAI-enabled communications and their corresponding solutions. Finally, we propose a novel integrated mitigation solution for GenAI-based channel estimation. At the data level, we establish a channel estimation hallucination dataset and employ generative adversarial networks (GANs)-based data augmentation. Additionally, we incorporate attention mechanisms and large language models (LLMs) to enhance both training and inference performance. Experimental results demonstrate that the proposed hybrid solutions reduce the normalized mean square error (NMSE) by 0.19, effectively reducing wireless hallucinations.
Abstract:Low-Altitude Economy Networks (LAENets) have emerged as significant enablers of social activities, offering low-altitude services such as the transportation of packages, groceries, and medical supplies. Unlike traditional terrestrial networks, LAENets are characterized by control mechanisms and ever-changing operational factors, which make them more complex and susceptible to vulnerabilities. As applications of LAENet continue to expand, robustness of these systems becomes crucial. In this paper, we investigate a novel application of Generative Artificial Intelligence (GenAI) to improve the robustness of LAENets. We conduct a systematic analysis of robustness requirements for LAENets, complemented by a comprehensive review of robust Quality of Service (QoS) metrics from the wireless physical layer perspective. We then investigate existing GenAI-enabled approaches for robustness enhancement. This leads to our proposal of a novel diffusion-based optimization framework with a Mixture of Expert (MoE)-transformer actor network. In the robust beamforming case study, the proposed framework demonstrates its effectiveness by optimizing beamforming under uncertainties, achieving a more than 44% increase in the worst-case achievable secrecy rate. These findings highlight the significant potential of GenAI in strengthening LAENet robustness.
Abstract:Integrated sensing and communication (ISAC) uses the same software and hardware resources to achieve both communication and sensing functionalities. Thus, it stands as one of the core technologies of 6G and has garnered significant attention in recent years. In ISAC systems, a variety of machine learning models are trained to analyze and identify signal patterns, thereby ensuring reliable sensing and communications. However, considering factors such as communication rates, costs, and privacy, collecting sufficient training data from various ISAC scenarios for these models is impractical. Hence, this paper introduces a generative AI (GenAI) enabled robust data augmentation scheme. The scheme first employs a conditioned diffusion model trained on a limited amount of collected CSI data to generate new samples, thereby expanding the sample quantity. Building on this, the scheme further utilizes another diffusion model to enhance the sample quality, thereby facilitating the data augmentation in scenarios where the original sensing data is insufficient and unevenly distributed. Moreover, we propose a novel algorithm to estimate the acceleration and jerk of signal propagation path length changes from CSI. We then use the proposed scheme to enhance the estimated parameters and detect the number of targets based on the enhanced data. The evaluation reveals that our scheme improves the detection performance by up to 70%, demonstrating reliability and robustness, which supports the deployment and practical use of the ISAC network.
Abstract:Due to massive computational demands of large generative models, AI-Generated Content (AIGC) can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and customized content generation for resource-constrained users. However, such a paradigm faces two significant challenges: 1) raw prompts (i.e., the task description from users) often lead to poor generation quality due to users' lack of experience with specific AIGC models, and 2) static service provisioning fails to efficiently utilize computational and communication resources given the heterogeneity of AIGC tasks. To address these challenges, we propose an intelligent mobile AIGC service scheme. Firstly, we develop an interactive prompt engineering mechanism that leverages a Large Language Model (LLM) to generate customized prompt corpora and employs Inverse Reinforcement Learning (IRL) for policy imitation through small-scale expert demonstrations. Secondly, we formulate a dynamic mobile AIGC service provisioning problem that jointly optimizes the number of inference trials and transmission power allocation. Then, we propose the Diffusion-Enhanced Deep Deterministic Policy Gradient (D3PG) algorithm to solve the problem. By incorporating the diffusion process into Deep Reinforcement Learning (DRL) architecture, the environment exploration capability can be improved, thus adapting to varying mobile AIGC scenarios. Extensive experimental results demonstrate that our prompt engineering approach improves single-round generation success probability by 6.3 times, while D3PG increases the user service experience by 67.8% compared to baseline DRL approaches.
Abstract:This paper proposes a UAV-assisted forwarding system based on distributed beamforming to enhance age of information (AoI) in Internet of Things (IoT). Specifically, UAVs collect and relay data between sensor nodes (SNs) and the remote base station (BS). However, flight delays increase the AoI and degrade the network performance. To mitigate this, we adopt distributed beamforming to extend the communication range, reduce the flight frequency and ensure the continuous data relay and efficient energy utilization. Then, we formulate an optimization problem to minimize AoI and UAV energy consumption, by jointly optimizing the UAV trajectories and communication schedules. The problem is non-convex and with high dynamic, and thus we propose a deep reinforcement learning (DRL)-based algorithm to solve the problem, thereby enhancing the stability and accelerate convergence speed. Simulation results show that the proposed algorithm effectively addresses the problem and outperforms other benchmark algorithms.
Abstract:Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) and data collection (DC) have been popular research issues. Different from existing works that consider MEC and DC scenarios separately, this paper investigates a multi-UAV-assisted joint MEC-DC system. Specifically, we formulate a joint optimization problem to minimize the MEC latency and maximize the collected data volume. This problem can be classified as a non-convex mixed integer programming problem that exhibits long-term optimization and dynamics. Thus, we propose a deep reinforcement learning-based approach that jointly optimizes the UAV movement, user transmit power, and user association in real time to solve the problem efficiently. Specifically, we reformulate the optimization problem into an action space-reduced Markov decision process (MDP) and optimize the user association by using a two-phase matching-based association (TMA) strategy. Subsequently, we propose a soft actor-critic (SAC)-based approach that integrates the proposed TMA strategy (SAC-TMA) to solve the formulated joint optimization problem collaboratively. Simulation results demonstrate that the proposed SAC-TMA is able to coordinate the two subsystems and can effectively reduce the system latency and improve the data collection volume compared with other benchmark algorithms.
Abstract:Multi-class cell segmentation in high-resolution gigapixel whole slide images (WSI) is crucial for various clinical applications. However, training such models typically requires labor-intensive, pixel-wise annotations by domain experts. Recent efforts have democratized this process by involving lay annotators without medical expertise. However, conventional non-agent-based approaches struggle to handle annotation noise adaptively, as they lack mechanisms to mitigate false positives (FP) and false negatives (FN) at both the image-feature and pixel levels. In this paper, we propose a consensus-aware self-corrective AI agent that leverages the Consensus Matrix to guide its learning process. The Consensus Matrix defines regions where both the AI and annotators agree on cell and non-cell annotations, which are prioritized with stronger supervision. Conversely, areas of disagreement are adaptively weighted based on their feature similarity to high-confidence agreement regions, with more similar regions receiving greater attention. Additionally, contrastive learning is employed to separate features of noisy regions from those of reliable agreement regions by maximizing their dissimilarity. This paradigm enables the AI to iteratively refine noisy labels, enhancing its robustness. Validated on one real-world lay-annotated cell dataset and two simulated noisy datasets, our method demonstrates improved segmentation performance, effectively correcting FP and FN errors and showcasing its potential for training robust models on noisy datasets. The official implementation and cell annotations are publicly available at https://github.com/ddrrnn123/CASC-AI.
Abstract:Employing self-supervised learning (SSL) methodologies assumes par-amount significance in handling unlabeled polyp datasets when building deep learning-based automatic polyp segmentation models. However, the intricate privacy dynamics surrounding medical data often preclude seamless data sharing among disparate medical centers. Federated learning (FL) emerges as a formidable solution to this privacy conundrum, yet within the realm of FL, optimizing model generalization stands as a pressing imperative. Robust generalization capabilities are imperative to ensure the model's efficacy across diverse geographical domains post-training on localized client datasets. In this paper, a Federated self-supervised Domain Generalization method is proposed to enhance the generalization capacity of federated and Label-efficient intestinal polyp segmentation, named LFDG. Based on a classical SSL method, DropPos, LFDG proposes an adversarial learning-based data augmentation method (SSADA) to enhance the data diversity. LFDG further proposes a relaxation module based on Source-reconstruction and Augmentation-masking (SRAM) to maintain stability in feature learning. We have validated LFDG on polyp images from six medical centers. The performance of our method achieves 3.80% and 3.92% better than the baseline and other recent FL methods and SSL methods, respectively.