Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chunyan Xu

Multi-level Attention-guided Graph Neural Network for Image Restoration

Feb 26, 2025

Jiatao Jiang, Zhen Cui, Chunyan Xu, Jian Yang

Abstract:In recent years, deep learning has achieved remarkable success in the field of image restoration. However, most convolutional neural network-based methods typically focus on a single scale, neglecting the incorporation of multi-scale information. In image restoration tasks, local features of an image are often insufficient, necessitating the integration of global features to complement them. Although recent neural network algorithms have made significant strides in feature extraction, many models do not explicitly model global features or consider the relationship between global and local features. This paper proposes multi-level attention-guided graph neural network. The proposed network explicitly constructs element block graphs and element graphs within feature maps using multi-attention mechanisms to extract both local structural features and global representation information of the image. Since the network struggles to effectively extract global information during image degradation, the structural information of local feature blocks can be used to correct and supplement the global information. Similarly, when element block information in the feature map is missing, it can be refined using global element representation information. The graph within the network learns real-time dynamic connections through the multi-attention mechanism, and information is propagated and aggregated via graph convolution algorithms. By combining local element block information and global element representation information from the feature map, the algorithm can more effectively restore missing information in the image. Experimental results on several classic image restoration tasks demonstrate the effectiveness of the proposed method, achieving state-of-the-art performance.

Via

Access Paper or Ask Questions

MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation

Oct 26, 2024

Jialin Luo, Yuanzhi Wang, Ziqi Gu, Yide Qiu, Shuaizhen Yao, Fuyun Wang, Chunyan Xu, Wenhua Zhang, Dan Wang, Zhen Cui

Figure 1 for MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation

Figure 2 for MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation

Figure 3 for MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation

Figure 4 for MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation

Abstract:Recently, the diffusion-based generative paradigm has achieved impressive general image generation capabilities with text prompts due to its accurate distribution modeling and stable training process. However, generating diverse remote sensing (RS) images that are tremendously different from general images in terms of scale and perspective remains a formidable challenge due to the lack of a comprehensive remote sensing image generation dataset with various modalities, ground sample distances (GSD), and scenes. In this paper, we propose a Multi-modal, Multi-GSD, Multi-scene Remote Sensing (MMM-RS) dataset and benchmark for text-to-image generation in diverse remote sensing scenarios. Specifically, we first collect nine publicly available RS datasets and conduct standardization for all samples. To bridge RS images to textual semantic information, we utilize a large-scale pretrained vision-language model to automatically output text prompts and perform hand-crafted rectification, resulting in information-rich text-image pairs (including multi-modal images). In particular, we design some methods to obtain the images with different GSD and various environments (e.g., low-light, foggy) in a single sample. With extensive manual screening and refining annotations, we ultimately obtain a MMM-RS dataset that comprises approximately 2.1 million text-image pairs. Extensive experimental results verify that our proposed MMM-RS dataset allows off-the-shelf diffusion models to generate diverse RS images across various modalities, scenes, weather conditions, and GSD. The dataset is available at https://github.com/ljl5261/MMM-RS.

* Accepted by NeurIPS 2024

Via

Access Paper or Ask Questions

Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

Sep 03, 2024

Yanguang Sun, Chunyan Xu, Jian Yang, Hanyu Xuan, Lei Luo

Figure 1 for Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

Figure 2 for Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

Figure 3 for Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

Figure 4 for Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

Abstract:Camouflaged object detection has attracted a lot of attention in computer vision. The main challenge lies in the high degree of similarity between camouflaged objects and their surroundings in the spatial domain, making identification difficult. Existing methods attempt to reduce the impact of pixel similarity by maximizing the distinguishing ability of spatial features with complicated design, but often ignore the sensitivity and locality of features in the spatial domain, leading to sub-optimal results. In this paper, we propose a new approach to address this issue by jointly exploring the representation in the frequency and spatial domains, introducing the Frequency-Spatial Entanglement Learning (FSEL) method. This method consists of a series of well-designed Entanglement Transformer Blocks (ETB) for representation learning, a Joint Domain Perception Module for semantic enhancement, and a Dual-domain Reverse Parser for feature integration in the frequency and spatial domains. Specifically, the ETB utilizes frequency self-attention to effectively characterize the relationship between different frequency bands, while the entanglement feed-forward network facilitates information interaction between features of different domains through entanglement learning. Our extensive experiments demonstrate the superiority of our FSEL over 21 state-of-the-art methods, through comprehensive quantitative and qualitative comparisons in three widely-used datasets. The source code is available at: https://github.com/CSYSI/FSEL.

* Accepted at ECCV 2024

Via

Access Paper or Ask Questions

Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection

Jul 08, 2024

Chenxu Wang, Chunyan Xu, Ziqi Gu, Zhen Cui

Figure 1 for Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection

Figure 2 for Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection

Figure 3 for Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection

Figure 4 for Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection

Abstract:While existing semi-supervised object detection (SSOD) methods perform well in general scenes, they encounter challenges in handling oriented objects in aerial images. We experimentally find three gaps between general and oriented object detection in semi-supervised learning: 1) Sampling inconsistency: the common center sampling is not suitable for oriented objects with larger aspect ratios when selecting positive labels from labeled data. 2) Assignment inconsistency: balancing the precision and localization quality of oriented pseudo-boxes poses greater challenges which introduces more noise when selecting positive labels from unlabeled data. 3) Confidence inconsistency: there exists more mismatch between the predicted classification and localization qualities when considering oriented objects, affecting the selection of pseudo-labels. Therefore, we propose a Multi-clue Consistency Learning (MCL) framework to bridge gaps between general and oriented objects in semi-supervised detection. Specifically, considering various shapes of rotated objects, the Gaussian Center Assignment is specially designed to select the pixel-level positive labels from labeled data. We then introduce the Scale-aware Label Assignment to select pixel-level pseudo-labels instead of unreliable pseudo-boxes, which is a divide-and-rule strategy suited for objects with various scales. The Consistent Confidence Soft Label is adopted to further boost the detector by maintaining the alignment of the predicted results. Comprehensive experiments on DOTA-v1.5 and DOTA-v1.0 benchmarks demonstrate that our proposed MCL can achieve state-of-the-art performance in the semi-supervised oriented object detection task.

Via

Access Paper or Ask Questions

Big-model Driven Few-shot Continual Learning

Sep 02, 2023

Ziqi Gu, Chunyan Xu, Zihan Lu, Xin Liu, Anbo Dai, Zhen Cui

Figure 1 for Big-model Driven Few-shot Continual Learning

Figure 2 for Big-model Driven Few-shot Continual Learning

Figure 3 for Big-model Driven Few-shot Continual Learning

Figure 4 for Big-model Driven Few-shot Continual Learning

Abstract:Few-shot continual learning (FSCL) has attracted intensive attention and achieved some advances in recent years, but now it is difficult to again make a big stride in accuracy due to the limitation of only few-shot incremental samples. Inspired by distinctive human cognition ability in life learning, in this work, we propose a novel Big-model driven Few-shot Continual Learning (B-FSCL) framework to gradually evolve the model under the traction of the world's big-models (like human accumulative knowledge). Specifically, we perform the big-model driven transfer learning to leverage the powerful encoding capability of these existing big-models, which can adapt the continual model to a few of newly added samples while avoiding the over-fitting problem. Considering that the big-model and the continual model may have different perceived results for the identical images, we introduce an instance-level adaptive decision mechanism to provide the high-level flexibility cognitive support adjusted to varying samples. In turn, the adaptive decision can be further adopted to optimize the parameters of the continual model, performing the adaptive distillation of big-model's knowledge information. Experimental results of our proposed B-FSCL on three popular datasets (including CIFAR100, minilmageNet and CUB200) completely surpass all state-of-the-art FSCL methods.

* 9 pages 6 figures

Via

Access Paper or Ask Questions

Learning Normal Dynamics in Videos with Meta Prototype Network

May 10, 2021

Hui Lv, Chen Chen, Zhen Cui, Chunyan Xu, Yong Li, Jian Yang

Figure 1 for Learning Normal Dynamics in Videos with Meta Prototype Network

Figure 2 for Learning Normal Dynamics in Videos with Meta Prototype Network

Figure 3 for Learning Normal Dynamics in Videos with Meta Prototype Network

Figure 4 for Learning Normal Dynamics in Videos with Meta Prototype Network

Abstract:Frame reconstruction (current or future frame) based on Auto-Encoder (AE) is a popular method for video anomaly detection. With models trained on the normal data, the reconstruction errors of anomalous scenes are usually much larger than those of normal ones. Previous methods introduced the memory bank into AE, for encoding diverse normal patterns across the training videos. However, they are memory-consuming and cannot cope with unseen new scenarios in the testing data. In this work, we propose a dynamic prototype unit (DPU) to encode the normal dynamics as prototypes in real time, free from extra memory cost. In addition, we introduce meta-learning to our DPU to form a novel few-shot normalcy learner, namely Meta-Prototype Unit (MPU). It enables the fast adaption capability on new scenes by only consuming a few iterations of update. Extensive experiments are conducted on various benchmarks. The superior performance over the state-of-the-art demonstrates the effectiveness of our method.

* 9 pages, 4 figures, 6 tables

Via

Access Paper or Ask Questions

Global Information Guided Video Anomaly Detection

Apr 14, 2021

Hui Lv, Chunyan Xu, Zhen Cui

Figure 1 for Global Information Guided Video Anomaly Detection

Figure 2 for Global Information Guided Video Anomaly Detection

Figure 3 for Global Information Guided Video Anomaly Detection

Abstract:Video anomaly detection (VAD) is currently a challenging task due to the complexity of anomaly as well as the lack of labor-intensive temporal annotations. In this paper, we propose an end-to-end Global Information Guided (GIG) anomaly detection framework for anomaly detection using the video-level annotations (i.e., weak labels). We propose to first mine the global pattern cues by leveraging the weak labels in a GIG module. Then we build a spatial reasoning module to measure the relevance between vectors in spatial domain with the global cue vectors, and select the most related feature vectors for temporal anomaly detection. The experimental results on the CityScene challenge demonstrate the effectiveness of our model.

Via

Access Paper or Ask Questions

Spatial-Temporal Tensor Graph Convolutional Network for Traffic Prediction

Mar 10, 2021

Xuran Xu, Tong Zhang, Chunyan Xu, Zhen Cui, Jian Yang

Figure 1 for Spatial-Temporal Tensor Graph Convolutional Network for Traffic Prediction

Figure 2 for Spatial-Temporal Tensor Graph Convolutional Network for Traffic Prediction

Figure 3 for Spatial-Temporal Tensor Graph Convolutional Network for Traffic Prediction

Figure 4 for Spatial-Temporal Tensor Graph Convolutional Network for Traffic Prediction

Abstract:Accurate traffic prediction is crucial to the guidance and management of urban traffics. However, most of the existing traffic prediction models do not consider the computational burden and memory space when they capture spatial-temporal dependence among traffic data. In this work, we propose a factorized Spatial-Temporal Tensor Graph Convolutional Network to deal with traffic speed prediction. Traffic networks are modeled and unified into a graph that integrates spatial and temporal information simultaneously. We further extend graph convolution into tensor space and propose a tensor graph convolution network to extract more discriminating features from spatial-temporal graph data. To reduce the computational burden, we take Tucker tensor decomposition and derive factorized a tensor convolution, which performs separate filtering in small-scale space, time, and feature modes. Besides, we can benefit from noise suppression of traffic data when discarding those trivial components in the process of tensor decomposition. Extensive experiments on two real-world traffic speed datasets demonstrate our method is more effective than those traditional traffic prediction methods, and meantime achieves state-of-the-art performance.

Via

Access Paper or Ask Questions

Attention-Aware Noisy Label Learning for Image Classification

Sep 30, 2020

Zhenzhen Wang, Chunyan Xu, Yap-Peng Tan, Junsong Yuan

Figure 1 for Attention-Aware Noisy Label Learning for Image Classification

Figure 2 for Attention-Aware Noisy Label Learning for Image Classification

Figure 3 for Attention-Aware Noisy Label Learning for Image Classification

Figure 4 for Attention-Aware Noisy Label Learning for Image Classification

Abstract:Deep convolutional neural networks (CNNs) learned on large-scale labeled samples have achieved remarkable progress in computer vision, such as image/video classification. The cheapest way to obtain a large body of labeled visual data is to crawl from websites with user-supplied labels, such as Flickr. However, these samples often tend to contain incorrect labels (i.e. noisy labels), which will significantly degrade the network performance. In this paper, the attention-aware noisy label learning approach ($A^2NL$) is proposed to improve the discriminative capability of the network trained on datasets with potential label noise. Specifically, a Noise-Attention model, which contains multiple noise-specific units, is designed to better capture noisy information. Each unit is expected to learn a specific noisy distribution for a subset of images so that different disturbances are more precisely modeled. Furthermore, a recursive learning process is introduced to strengthen the learning ability of the attention network by taking advantage of the learned high-level knowledge. To fully evaluate the proposed method, we conduct experiments from two aspects: manually flipped label noise on large-scale image classification datasets, including CIFAR-10, SVHN; and real-world label noise on an online crawled clothing dataset with multiple attributes. The superior results over state-of-the-art methods validate the effectiveness of our proposed approach.

* 10 pages, 8 figures

Via

Access Paper or Ask Questions

Spatial Transformer Point Convolution

Sep 03, 2020

Yuan Fang, Chunyan Xu, Zhen Cui, Yuan Zong, Jian Yang

Figure 1 for Spatial Transformer Point Convolution

Figure 2 for Spatial Transformer Point Convolution

Figure 3 for Spatial Transformer Point Convolution

Figure 4 for Spatial Transformer Point Convolution

Abstract:Point clouds are unstructured and unordered in the embedded 3D space. In order to produce consistent responses under different permutation layouts, most existing methods aggregate local spatial points through maximum or summation operation. But such an aggregation essentially belongs to the isotropic filtering on all operated points therein, which tends to lose the information of geometric structures. In this paper, we propose a spatial transformer point convolution (STPC) method to achieve anisotropic convolution filtering on point clouds. To capture and represent implicit geometric structures, we specifically introduce spatial direction dictionary to learn those latent geometric components. To better encode unordered neighbor points, we design sparse deformer to transform them into the canonical ordered dictionary space by using direction dictionary learning. In the transformed space, the standard image-like convolution can be leveraged to generate anisotropic filtering, which is more robust to express those finer variances of local regions. Dictionary learning and encoding processes are encapsulated into a network module and jointly learnt in an end-to-end manner. Extensive experiments on several public datasets (including S3DIS, Semantic3D, SemanticKITTI) demonstrate the effectiveness of our proposed method in point clouds semantic segmentation task.

Via

Access Paper or Ask Questions