Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Zuo

Improving underwater semantic segmentation with underwater image quality attention and muti-scale aggregation attention

Mar 30, 2025

Xin Zuo, Jiaran Jiang, Jifeng Shen, Wankou Yang

Abstract:Underwater image understanding is crucial for both submarine navigation and seabed exploration. However, the low illumination in underwater environments degrades the imaging quality, which in turn seriously deteriorates the performance of underwater semantic segmentation, particularly for outlining the object region boundaries. To tackle this issue, we present UnderWater SegFormer (UWSegFormer), a transformer-based framework for semantic segmentation of low-quality underwater images. Firstly, we propose the Underwater Image Quality Attention (UIQA) module. This module enhances the representation of highquality semantic information in underwater image feature channels through a channel self-attention mechanism. In order to address the issue of loss of imaging details due to the underwater environment, the Multi-scale Aggregation Attention(MAA) module is proposed. This module aggregates sets of semantic features at different scales by extracting discriminative information from high-level features,thus compensating for the semantic loss of detail in underwater objects. Finally, during training, we introduce Edge Learning Loss (ELL) in order to enhance the model's learning of underwater object edges and improve the model's prediction accuracy. Experiments conducted on the SUIM and DUT-USEG (DUT) datasets have demonstrated that the proposed method has advantages in terms of segmentation completeness, boundary clarity, and subjective perceptual details when compared to SOTA methods. In addition, the proposed method achieves the highest mIoU of 82.12 and 71.41 on the SUIM and DUT datasets, respectively. Code will be available at https://github.com/SAWRJJ/UWSegFormer.

* Accepted by Pattern Analysis and Applications

Via

Access Paper or Ask Questions

Topology-aware Mamba for Crack Segmentation in Structures

Oct 25, 2024

Xin Zuo, Yu Sheng, Jifeng Shen, Yongwei Shan

Figure 1 for Topology-aware Mamba for Crack Segmentation in Structures

Figure 2 for Topology-aware Mamba for Crack Segmentation in Structures

Figure 3 for Topology-aware Mamba for Crack Segmentation in Structures

Figure 4 for Topology-aware Mamba for Crack Segmentation in Structures

Abstract:CrackMamba, a Mamba-based model, is designed for efficient and accurate crack segmentation for monitoring the structural health of infrastructure. Traditional Convolutional Neural Network (CNN) models struggle with limited receptive fields, and while Vision Transformers (ViT) improve segmentation accuracy, they are computationally intensive. CrackMamba addresses these challenges by utilizing the VMambaV2 with pre-trained ImageNet-1k weights as the encoder and a newly designed decoder for better performance. To handle the random and complex nature of crack development, a Snake Scan module is proposed to reshape crack feature sequences, enhancing feature extraction. Additionally, the three-branch Snake Conv VSS (SCVSS) block is proposed to target cracks more effectively. Experiments show that CrackMamba achieves state-of-the-art (SOTA) performance on the CrackSeg9k and SewerCrack datasets, and demonstrates competitive performance on the retinal vessel segmentation dataset CHASE\underline{~}DB1, highlighting its generalization capability. The code is publicly available at: {https://github.com/shengyu27/CrackMamba.}

* Published at Journal of Automation in Construction

Via

Access Paper or Ask Questions

Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature Enhancement and Label Correlation Learning

Aug 01, 2024

Xin Zuo, Yu Sheng, Jifeng Shen, Yongwei Shan

Figure 1 for Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature Enhancement and Label Correlation Learning

Figure 2 for Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature Enhancement and Label Correlation Learning

Figure 3 for Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature Enhancement and Label Correlation Learning

Figure 4 for Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature Enhancement and Label Correlation Learning

Abstract:The coexistence of multiple defect categories as well as the substantial class imbalance problem significantly impair the detection of sewer pipeline defects. To solve this problem, a multi-label pipe defect recognition method is proposed based on mask attention guided feature enhancement and label correlation learning. The proposed method can achieve current approximate state-of-the-art classification performance using just 1/16 of the Sewer-ML training dataset and exceeds the current best method by 11.87\% in terms of F2 metric on the full dataset, while also proving the superiority of the model. The major contribution of this study is the development of a more efficient model for identifying and locating multiple defects in sewer pipe images for a more accurate sewer pipeline condition assessment. Moreover, by employing class activation maps, our method can accurately pinpoint multiple defect categories in the image which demonstrates a strong model interpretability. Our code is available at \href{https://github.com/shengyu27/MA-Q2L}{\textcolor{black}{https://github.com/shengyu27/MA-Q2L.}

* Accepted by the Journal of Computing in Civil Engineering

Via

Access Paper or Ask Questions

SSPNet: Scale and Spatial Priors Guided Generalizable and Interpretable Pedestrian Attribute Recognition

Dec 11, 2023

Jifeng Shen, Teng Guo, Xin Zuo, Heng Fan, Wankou Yang

Figure 1 for SSPNet: Scale and Spatial Priors Guided Generalizable and Interpretable Pedestrian Attribute Recognition

Figure 2 for SSPNet: Scale and Spatial Priors Guided Generalizable and Interpretable Pedestrian Attribute Recognition

Figure 3 for SSPNet: Scale and Spatial Priors Guided Generalizable and Interpretable Pedestrian Attribute Recognition

Figure 4 for SSPNet: Scale and Spatial Priors Guided Generalizable and Interpretable Pedestrian Attribute Recognition

Abstract:Global feature based Pedestrian Attribute Recognition (PAR) models are often poorly localized when using Grad-CAM for attribute response analysis, which has a significant impact on the interpretability, generalizability and performance. Previous researches have attempted to improve generalization and interpretation through meticulous model design, yet they often have neglected or underutilized effective prior information crucial for PAR. To this end, a novel Scale and Spatial Priors Guided Network (SSPNet) is proposed for PAR, which is mainly composed of the Adaptive Feature Scale Selection (AFSS) and Prior Location Extraction (PLE) modules. The AFSS module learns to provide reasonable scale prior information for different attribute groups, allowing the model to focus on different levels of feature maps with varying semantic granularity. The PLE module reveals potential attribute spatial prior information, which avoids unnecessary attention on irrelevant areas and lowers the risk of model over-fitting. More specifically, the scale prior in AFSS is adaptively learned from different layers of feature pyramid with maximum accuracy, while the spatial priors in PLE can be revealed from part feature with different granularity (such as image blocks, human pose keypoint and sparse sampling points). Besides, a novel IoU based attribute localization metric is proposed for Weakly-supervised Pedestrian Attribute Localization (WPAL) based on the improved Grad-CAM for attribute response mask. The experimental results on the intra-dataset and cross-dataset evaluations demonstrate the effectiveness of our proposed method in terms of mean accuracy (mA). Furthermore, it also achieves superior performance on the PCS dataset for attribute localization in terms of IoU. Code will be released at https://github.com/guotengg/SSPNet.

* 39 pages, 11 figures, Accepted by Pattern Recognition

Via

Access Paper or Ask Questions

ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection

Aug 15, 2023

Jifeng Shen, Yifei Chen, Yue Liu, Xin Zuo, Heng Fan, Wankou Yang

Abstract:Effective feature fusion of multispectral images plays a crucial role in multi-spectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deffciency in local-range feature interaction resulting in the performance degradation. To address this issue, a novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction and capture complementary information across modalities simultaneously. This framework enhances the discriminability of object features through the query-guided cross-attention mechanism, leading to improved performance. However, stacking multiple transformer blocks for feature enhancement incurs a large number of parameters and high spatial complexity. To handle this, inspired by the human process of reviewing knowledge, an iterative interaction mechanism is proposed to share parameters among block-wise multimodal transformers, reducing model complexity and computation cost. The proposed method is general and effective to be integrated into different detection frameworks and used with different backbones. Experimental results on KAIST, FLIR, and VEDAI datasets show that the proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios. Code will be available at https://github.com/chanchanchan97/ICAFusion.

* submitted to Pattern Recognition Journal, minor revision

Via

Access Paper or Ask Questions

Self-supervised Learning for Heterogeneous Graph via Structure Information based on Metapath

Sep 09, 2022

Shuai Ma, Jian-wei Liu, Xin Zuo

Figure 1 for Self-supervised Learning for Heterogeneous Graph via Structure Information based on Metapath

Figure 2 for Self-supervised Learning for Heterogeneous Graph via Structure Information based on Metapath

Figure 3 for Self-supervised Learning for Heterogeneous Graph via Structure Information based on Metapath

Figure 4 for Self-supervised Learning for Heterogeneous Graph via Structure Information based on Metapath

Abstract:graph neural networks (GNNs) are the dominant paradigm for modeling and handling graph structure data by learning universal node representation. The traditional way of training GNNs depends on a great many labeled data, which results in high requirements on cost and time. In some special scene, it is even unavailable and impracticable. Self-supervised representation learning, which can generate labels by graph structure data itself, is a potential approach to tackle this problem. And turning to research on self-supervised learning problem for heterogeneous graphs is more challenging than dealing with homogeneous graphs, also there are fewer studies about it. In this paper, we propose a SElfsupervised learning method for heterogeneous graph via Structure Information based on Metapath (SESIM). The proposed model can construct pretext tasks by predicting jump number between nodes in each metapath to improve the representation ability of primary task. In order to predict jump number, SESIM uses data itself to generate labels, avoiding time-consuming manual labeling. Moreover, predicting jump number in each metapath can effectively utilize graph structure information, which is the essential property between nodes. Therefore, SESIM deepens the understanding of models for graph structure. At last, we train primary task and pretext tasks jointly, and use meta-learning to balance the contribution of pretext tasks for primary task. Empirical results validate the performance of SESIM method and demonstrate that this method can improve the representation ability of traditional neural networks on link prediction task and node classification task.

* 32 pages

Via

Access Paper or Ask Questions

Multi-Scale Iterative Refinement Network for RGB-D Salient Object Detection

Jan 24, 2022

Ze-yu Liu, Jian-wei Liu, Xin Zuo, Ming-fei Hu

Figure 1 for Multi-Scale Iterative Refinement Network for RGB-D Salient Object Detection

Figure 2 for Multi-Scale Iterative Refinement Network for RGB-D Salient Object Detection

Figure 3 for Multi-Scale Iterative Refinement Network for RGB-D Salient Object Detection

Figure 4 for Multi-Scale Iterative Refinement Network for RGB-D Salient Object Detection

Abstract:The extensive research leveraging RGB-D information has been exploited in salient object detection. However, salient visual cues appear in various scales and resolutions of RGB images due to semantic gaps at different feature levels. Meanwhile, similar salient patterns are available in cross-modal depth images as well as multi-scale versions. Cross-modal fusion and multi-scale refinement are still an open problem in RGB-D salient object detection task. In this paper, we begin by introducing top-down and bottom-up iterative refinement architecture to leverage multi-scale features, and then devise attention based fusion module (ABF) to address on cross-modal correlation. We conduct extensive experiments on seven public datasets. The experimental results show the effectiveness of our devised method

* Engineering Applications of Artificial Intelligence(2021)
* 40 pages

Via

Access Paper or Ask Questions

Online Deep Learning based on Auto-Encoder

Jan 19, 2022

Si-si Zhang, Jian-wei Liu, Xin Zuo, Run-kun Lu, Si-ming Lian

Abstract:Online learning is an important technical means for sketching massive real-time and high-speed data. Although this direction has attracted intensive attention, most of the literature in this area ignore the following three issues: (1) they think little of the underlying abstract hierarchical latent information existing in examples, even if extracting these abstract hierarchical latent representations is useful to better predict the class labels of examples; (2) the idea of preassigned model on unseen datapoints is not suitable for modeling streaming data with evolving probability distribution. This challenge is referred as model flexibility. And so, with this in minds, the online deep learning model we need to design should have a variable underlying structure; (3) moreover, it is of utmost importance to fusion these abstract hierarchical latent representations to achieve better classification performance, and we should give different weights to different levels of implicit representation information when dealing with the data streaming where the data distribution changes. To address these issues, we propose a two-phase Online Deep Learning based on Auto-Encoder (ODLAE). Based on auto-encoder, considering reconstruction loss, we extract abstract hierarchical latent representations of instances; Based on predictive loss, we devise two fusion strategies: the output-level fusion strategy, which is obtained by fusing the classification results of encoder each hidden layer; and feature-level fusion strategy, which is leveraged self-attention mechanism to fusion every hidden layer output. Finally, in order to improve the robustness of the algorithm, we also try to utilize the denoising auto-encoder to yield hierarchical latent representations. Experimental results on different datasets are presented to verify the validity of our proposed algorithm (ODLAE) outperforms several baselines.

* Applied Intelligence (2021)
* 30 pages

Via

Access Paper or Ask Questions

Multi-View representation learning in Multi-Task Scene

Jan 15, 2022

Run-kun Lu, Jian-wei Liu, Si-ming Lian, Xin Zuo

Figure 1 for Multi-View representation learning in Multi-Task Scene

Figure 2 for Multi-View representation learning in Multi-Task Scene

Figure 3 for Multi-View representation learning in Multi-Task Scene

Figure 4 for Multi-View representation learning in Multi-Task Scene

Abstract:Over recent decades have witnessed considerable progress in whether multi-task learning or multi-view learning, but the situation that consider both learning scenes simultaneously has received not too much attention. How to utilize multiple views latent representation of each single task to improve each learning task performance is a challenge problem. Based on this, we proposed a novel semi-supervised algorithm, termed as Multi-Task Multi-View learning based on Common and Special Features (MTMVCSF). In general, multi-views are the different aspects of an object and every view includes the underlying common or special information of this object. As a consequence, we will mine multiple views jointly latent factor of each learning task which consists of each view special feature and the common feature of all views. By this way, the original multi-task multi-view data has degenerated into multi-task data, and exploring the correlations among multiple tasks enables to make an improvement on the performance of learning algorithm. Another obvious advantage of this approach is that we get latent representation of the set of unlabeled instances by the constraint of regression task with labeled instances. The performance of classification and semi-supervised clustering task in these latent representations perform obviously better than it in raw data. Furthermore, an anti-noise multi-task multi-view algorithm called AN-MTMVCSF is proposed, which has a strong adaptability to noise labels. The effectiveness of these algorithms is proved by a series of well-designed experiments on both real world and synthetic data.

* Neural Computing and Applications(2020)
* 32 pages

Via

Access Paper or Ask Questions

Auto-Encoder based Co-Training Multi-View Representation Learning

Jan 09, 2022

Run-kun Lu, Jian-wei Liu, Yuan-fang Wang, Hao-jie Xie, Xin Zuo

Figure 1 for Auto-Encoder based Co-Training Multi-View Representation Learning

Figure 2 for Auto-Encoder based Co-Training Multi-View Representation Learning

Figure 3 for Auto-Encoder based Co-Training Multi-View Representation Learning

Figure 4 for Auto-Encoder based Co-Training Multi-View Representation Learning

Abstract:Multi-view learning is a learning problem that utilizes the various representations of an object to mine valuable knowledge and improve the performance of learning algorithm, and one of the significant directions of multi-view learning is sub-space learning. As we known, auto-encoder is a method of deep learning, which can learn the latent feature of raw data by reconstructing the input, and based on this, we propose a novel algorithm called Auto-encoder based Co-training Multi-View Learning (ACMVL), which utilizes both complementarity and consistency and finds a joint latent feature representation of multiple views. The algorithm has two stages, the first is to train auto-encoder of each view, and the second stage is to train a supervised network. Interestingly, the two stages share the weights partly and assist each other by co-training process. According to the experimental result, we can learn a well performed latent feature representation, and auto-encoder of each view has more powerful reconstruction ability than traditional auto-encoder.

Via

Access Paper or Ask Questions