Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haigen Hu

MaskAnyNet: Rethinking Masked Image Regions as Valuable Information in Supervised Learning

Nov 16, 2025

Jingshan Hong, Haigen Hu, Huihuang Zhang, Qianwei Zhou, Zhao Li

Abstract:In supervised learning, traditional image masking faces two key issues: (i) discarded pixels are underutilized, leading to a loss of valuable contextual information; (ii) masking may remove small or critical features, especially in fine-grained tasks. In contrast, masked image modeling (MIM) has demonstrated that masked regions can be reconstructed from partial input, revealing that even incomplete data can exhibit strong contextual consistency with the original image. This highlights the potential of masked regions as sources of semantic diversity. Motivated by this, we revisit the image masking approach, proposing to treat masked content as auxiliary knowledge rather than ignored. Based on this, we propose MaskAnyNet, which combines masking with a relearning mechanism to exploit both visible and masked information. It can be easily extended to any model with an additional branch to jointly learn from the recomposed masked region. This approach leverages the semantic diversity of the masked regions to enrich features and preserve fine-grained details. Experiments on CNN and Transformer backbones show consistent gains across multiple benchmarks. Further analysis confirms that the proposed method improves semantic diversity through the reuse of masked content.

Via

Access Paper or Ask Questions

MHAF-YOLO: Multi-Branch Heterogeneous Auxiliary Fusion YOLO for accurate object detection

Feb 07, 2025

Zhiqiang Yang, Qiu Guan, Zhongwen Yu, Xinli Xu, Haixia Long, Sheng Lian, Haigen Hu, Ying Tang

Abstract:Due to the effective multi-scale feature fusion capabilities of the Path Aggregation FPN (PAFPN), it has become a widely adopted component in YOLO-based detectors. However, PAFPN struggles to integrate high-level semantic cues with low-level spatial details, limiting its performance in real-world applications, especially with significant scale variations. In this paper, we propose MHAF-YOLO, a novel detection framework featuring a versatile neck design called the Multi-Branch Auxiliary FPN (MAFPN), which consists of two key modules: the Superficial Assisted Fusion (SAF) and Advanced Assisted Fusion (AAF). The SAF bridges the backbone and the neck by fusing shallow features, effectively transferring crucial low-level spatial information with high fidelity. Meanwhile, the AAF integrates multi-scale feature information at deeper neck layers, delivering richer gradient information to the output layer and further enhancing the model learning capacity. To complement MAFPN, we introduce the Global Heterogeneous Flexible Kernel Selection (GHFKS) mechanism and the Reparameterized Heterogeneous Multi-Scale (RepHMS) module to enhance feature fusion. RepHMS is globally integrated into the network, utilizing GHFKS to select larger convolutional kernels for various feature layers, expanding the vertical receptive field and capturing contextual information across spatial hierarchies. Locally, it optimizes convolution by processing both large and small kernels within the same layer, broadening the lateral receptive field and preserving crucial details for detecting smaller targets. The source code of this work is available at: https://github.com/yang0201/MHAF-YOLO.

* arXiv admin note: text overlap with arXiv:2407.04381

Via

Access Paper or Ask Questions

IAFI-FCOS: Intra- and across-layer feature interaction FCOS model for lesion detection of CT images

Sep 01, 2024

Qiu Guan, Mengjie Pan, Feng Chen, Zhiqiang Yang, Zhongwen Yu, Qianwei Zhou, Haigen Hu

Abstract:Effective lesion detection in medical image is not only rely on the features of lesion region,but also deeply relative to the surrounding information.However,most current methods have not fully utilize it.What is more,multi-scale feature fusion mechanism of most traditional detectors are unable to transmit detail information without loss,which makes it hard to detect small and boundary ambiguous lesion in early stage disease.To address the above issues,we propose a novel intra- and across-layer feature interaction FCOS model (IAFI-FCOS) with a multi-scale feature fusion mechanism ICAF-FPN,which is a network structure with intra-layer context augmentation (ICA) block and across-layer feature weighting (AFW) block.Therefore,the traditional FCOS detector is optimized by enriching the feature representation from two perspectives.Specifically,the ICA block utilizes dilated attention to augment the context information in order to capture long-range dependencies between the lesion region and the surrounding.The AFW block utilizes dual-axis attention mechanism and weighting operation to obtain the efficient across-layer interaction features,enhancing the representation of detailed features.Our approach has been extensively experimented on both the private pancreatic lesion dataset and the public DeepLesion dataset,our model achieves SOTA results on the pancreatic lesion dataset.

* 2024 IJCNN

Via

Access Paper or Ask Questions

DuEDL: Dual-Branch Evidential Deep Learning for Scribble-Supervised Medical Image Segmentation

May 23, 2024

Yitong Yang, Xinli Xu, Haigen Hu, Haixia Long, Qianwei Zhou, Qiu Guan

Figure 1 for DuEDL: Dual-Branch Evidential Deep Learning for Scribble-Supervised Medical Image Segmentation

Figure 2 for DuEDL: Dual-Branch Evidential Deep Learning for Scribble-Supervised Medical Image Segmentation

Figure 3 for DuEDL: Dual-Branch Evidential Deep Learning for Scribble-Supervised Medical Image Segmentation

Figure 4 for DuEDL: Dual-Branch Evidential Deep Learning for Scribble-Supervised Medical Image Segmentation

Abstract:Despite the recent progress in medical image segmentation with scribble-based annotations, the segmentation results of most models are still not ro-bust and generalizable enough in open environments. Evidential deep learn-ing (EDL) has recently been proposed as a promising solution to model predictive uncertainty and improve the reliability of medical image segmen-tation. However directly applying EDL to scribble-supervised medical im-age segmentation faces a tradeoff between accuracy and reliability. To ad-dress the challenge, we propose a novel framework called Dual-Branch Evi-dential Deep Learning (DuEDL). Firstly, the decoder of the segmentation network is changed to two different branches, and the evidence of the two branches is fused to generate high-quality pseudo-labels. Then the frame-work applies partial evidence loss and two-branch consistent loss for joint training of the model to adapt to the scribble supervision learning. The pro-posed method was tested on two cardiac datasets: ACDC and MSCMRseg. The results show that our method significantly enhances the reliability and generalization ability of the model without sacrificing accuracy, outper-forming state-of-the-art baselines. The code is available at https://github.com/Gardnery/DuEDL.

* 14 pages, 2 figures

Via

Access Paper or Ask Questions

Deep Co-supervision and Attention Fusion Strategy for Automatic COVID-19 Lung Infection Segmentation on CT Images

Dec 20, 2021

Haigen Hu, Leizhao Shen, Qiu Guan, Xiaoxin Li, Qianwei Zhou, Su Ruan

Figure 1 for Deep Co-supervision and Attention Fusion Strategy for Automatic COVID-19 Lung Infection Segmentation on CT Images

Figure 2 for Deep Co-supervision and Attention Fusion Strategy for Automatic COVID-19 Lung Infection Segmentation on CT Images

Figure 3 for Deep Co-supervision and Attention Fusion Strategy for Automatic COVID-19 Lung Infection Segmentation on CT Images

Figure 4 for Deep Co-supervision and Attention Fusion Strategy for Automatic COVID-19 Lung Infection Segmentation on CT Images

Abstract:Due to the irregular shapes,various sizes and indistinguishable boundaries between the normal and infected tissues, it is still a challenging task to accurately segment the infected lesions of COVID-19 on CT images. In this paper, a novel segmentation scheme is proposed for the infections of COVID-19 by enhancing supervised information and fusing multi-scale feature maps of different levels based on the encoder-decoder architecture. To this end, a deep collaborative supervision (Co-supervision) scheme is proposed to guide the network learning the features of edges and semantics. More specifically, an Edge Supervised Module (ESM) is firstly designed to highlight low-level boundary features by incorporating the edge supervised information into the initial stage of down-sampling. Meanwhile, an Auxiliary Semantic Supervised Module (ASSM) is proposed to strengthen high-level semantic information by integrating mask supervised information into the later stage. Then an Attention Fusion Module (AFM) is developed to fuse multiple scale feature maps of different levels by using an attention mechanism to reduce the semantic gaps between high-level and low-level feature maps. Finally, the effectiveness of the proposed scheme is demonstrated on four various COVID-19 CT datasets. The results show that the proposed three modules are all promising. Based on the baseline (ResUnet), using ESM, ASSM, or AFM alone can respectively increase Dice metric by 1.12\%, 1.95\%,1.63\% in our dataset, while the integration by incorporating three models together can rise 3.97\%. Compared with the existing approaches in various datasets, the proposed method can obtain better segmentation performance in some main metrics, and can achieve the best generalization and comprehensive performance.

* Pattern Recognition,2022,124:108452

Via

Access Paper or Ask Questions

Adaptively Customizing Activation Functions for Various Layers

Dec 17, 2021

Haigen Hu, Aizhu Liu, Qiu Guan, Xiaoxin Li, Shengyong Chen, Qianwei Zhou

Figure 1 for Adaptively Customizing Activation Functions for Various Layers

Figure 2 for Adaptively Customizing Activation Functions for Various Layers

Figure 3 for Adaptively Customizing Activation Functions for Various Layers

Figure 4 for Adaptively Customizing Activation Functions for Various Layers

Abstract:To enhance the nonlinearity of neural networks and increase their mapping abilities between the inputs and response variables, activation functions play a crucial role to model more complex relationships and patterns in the data. In this work, a novel methodology is proposed to adaptively customize activation functions only by adding very few parameters to the traditional activation functions such as Sigmoid, Tanh, and ReLU. To verify the effectiveness of the proposed methodology, some theoretical and experimental analysis on accelerating the convergence and improving the performance is presented, and a series of experiments are conducted based on various network models (such as AlexNet, VGGNet, GoogLeNet, ResNet and DenseNet), and various datasets (such as CIFAR10, CIFAR100, miniImageNet, PASCAL VOC and COCO) . To further verify the validity and suitability in various optimization strategies and usage scenarios, some comparison experiments are also implemented among different optimization strategies (such as SGD, Momentum, AdaGrad, AdaDelta and ADAM) and different recognition tasks like classification and detection. The results show that the proposed methodology is very simple but with significant performance in convergence speed, precision and generalization, and it can surpass other popular methods like ReLU and adaptive functions like Swish in almost all experiments in terms of overall performance.The code is publicly available at https://github.com/HuHaigen/Adaptively-Customizing-Activation-Functions. The package includes the proposed three adaptive activation functions for reproducibility purposes.

Via

Access Paper or Ask Questions

Training Deep Neural Networks for Wireless Sensor Networks Using Loosely and Weakly Labeled Images

Oct 06, 2020

Qianwei Zhou, Yuhang Chen, Baoqing Li, Xiaoxin Li, Chen Zhou, Jingchang Huang, Haigen Hu

Figure 1 for Training Deep Neural Networks for Wireless Sensor Networks Using Loosely and Weakly Labeled Images

Figure 2 for Training Deep Neural Networks for Wireless Sensor Networks Using Loosely and Weakly Labeled Images

Figure 3 for Training Deep Neural Networks for Wireless Sensor Networks Using Loosely and Weakly Labeled Images

Figure 4 for Training Deep Neural Networks for Wireless Sensor Networks Using Loosely and Weakly Labeled Images

Abstract:Although deep learning has achieved remarkable successes over the past years, few reports have been published about applying deep neural networks to Wireless Sensor Networks (WSNs) for image targets recognition where data, energy, computation resources are limited. In this work, a Cost-Effective Domain Generalization (CEDG) algorithm has been proposed to train an efficient network with minimum labor requirements. CEDG transfers networks from a publicly available source domain to an application-specific target domain through an automatically allocated synthetic domain. The target domain is isolated from parameters tuning and used for model selection and testing only. The target domain is significantly different from the source domain because it has new target categories and is consisted of low-quality images that are out of focus, low in resolution, low in illumination, low in photographing angle. The trained network has about 7M (ResNet-20 is about 41M) multiplications per prediction that is small enough to allow a digital signal processor chip to do real-time recognitions in our WSN. The category-level averaged error on the unseen and unbalanced target domain has been decreased by 41.12%.

Via

Access Paper or Ask Questions

Residual-Recursion Autoencoder for Shape Illustration Images

Feb 06, 2020

Qianwei Zhou, Peng Tao, Xiaoxin Li, Shengyong Chen, Fan Zhang, Haigen Hu

Figure 1 for Residual-Recursion Autoencoder for Shape Illustration Images

Figure 2 for Residual-Recursion Autoencoder for Shape Illustration Images

Figure 3 for Residual-Recursion Autoencoder for Shape Illustration Images

Figure 4 for Residual-Recursion Autoencoder for Shape Illustration Images

Abstract:Shape illustration images (SIIs) are common and important in describing the cross-sections of industrial products. Same as MNIST, the handwritten digit images, SIIs are gray or binary and containing shapes that are surrounded by large areas of blanks. In this work, Residual-Recursion Autoencoder (RRAE) has been proposed to extract low-dimensional features from SIIs while maintaining reconstruction accuracy as high as possible. RRAE will try to reconstruct the original image several times and recursively fill the latest residual image to the reserved channel of the encoder's input before the next trial of reconstruction. As a kind of neural network training framework, RRAE can wrap over other autoencoders and increase their performance. From experiment results, the reconstruction loss is decreased by 86.47% for convolutional autoencoder with high-resolution SIIs, 10.77% for variational autoencoder and 8.06% for conditional variational autoencoder with MNIST.

Via

Access Paper or Ask Questions

Towards the Automation of Deep Image Prior

Nov 17, 2019

Qianwei Zhou, Chen Zhou, Haigen Hu, Yuhang Chen, Shengyong Chen, Xiaoxin Li

Figure 1 for Towards the Automation of Deep Image Prior

Figure 2 for Towards the Automation of Deep Image Prior

Figure 3 for Towards the Automation of Deep Image Prior

Figure 4 for Towards the Automation of Deep Image Prior

Abstract:Single image inverse problem is a notoriously challenging ill-posed problem that aims to restore the original image from one of its corrupted versions. Recently, this field has been immensely influenced by the emergence of deep-learning techniques. Deep Image Prior (DIP) offers a new approach that forces the recovered image to be synthesized from a given deep architecture. While DIP is quite an effective unsupervised approach, it is deprecated in real-world applications because of the requirement of human assistance. In this work, we aim to find the best-recovered image without the assistance of humans by adding a stopping criterion, which will reach maximum when the iteration no longer improves the image quality. More specifically, we propose to add a pseudo noise to the corrupted image and measure the pseudo-noise component in the recovered image by the orthogonality between signal and noise. The accuracy of the orthogonal stopping criterion has been demonstrated for several tested problems such as denoising, super-resolution, and inpainting, in which 38 out of 40 experiments are higher than 95%.

Via

Access Paper or Ask Questions