Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lili Huang

Semantics Guided Disentangled GAN for Chest X-ray Image Rib Segmentation

Jul 22, 2024

Lili Huang, Dexin Ma, Xiaowei Zhao, Chenglong Li, Haifeng Zhao, Jin Tang, Chuanfu Li

Abstract:The label annotations for chest X-ray image rib segmentation are time consuming and laborious, and the labeling quality heavily relies on medical knowledge of annotators. To reduce the dependency on annotated data, existing works often utilize generative adversarial network (GAN) to generate training data. However, GAN-based methods overlook the nuanced information specific to individual organs, which degrades the generation quality of chest X-ray image. Hence, we propose a novel Semantics guided Disentangled GAN (SD-GAN), which can generate the high-quality training data by fully utilizing the semantic information of different organs, for chest X-ray image rib segmentation. In particular, we use three ResNet50 branches to disentangle features of different organs, then use a decoder to combine features and generate corresponding images. To ensure that the generated images correspond to the input organ labels in semantics tags, we employ a semantics guidance module to perform semantic guidance on the generated images. To evaluate the efficacy of SD-GAN in generating high-quality samples, we introduce modified TransUNet(MTUNet), a specialized segmentation network designed for multi-scale contextual information extracting and multi-branch decoding, effectively tackling the challenge of organ overlap. We also propose a new chest X-ray image dataset (CXRS). It includes 1250 samples from various medical institutions. Lungs, clavicles, and 24 ribs are simultaneously annotated on each chest X-ray image. The visualization and quantitative results demonstrate the efficacy of SD-GAN in generating high-quality chest X-ray image-mask pairs. Using generated data, our trained MTUNet overcomes the limitations of the data scale and outperforms other segmentation networks.

Via

Access Paper or Ask Questions

SequencePAR: Understanding Pedestrian Attributes via A Sequence Generation Paradigm

Dec 04, 2023

Jiandong Jin, Xiao Wang, Chenglong Li, Lili Huang, Jin Tang

Abstract:Current pedestrian attribute recognition (PAR) algorithms are developed based on multi-label or multi-task learning frameworks, which aim to discriminate the attributes using specific classification heads. However, these discriminative models are easily influenced by imbalanced data or noisy samples. Inspired by the success of generative models, we rethink the pedestrian attribute recognition scheme and believe the generative models may perform better on modeling dependencies and complexity between human attributes. In this paper, we propose a novel sequence generation paradigm for pedestrian attribute recognition, termed SequencePAR. It extracts the pedestrian features using a pre-trained CLIP model and embeds the attribute set into query tokens under the guidance of text prompts. Then, a Transformer decoder is proposed to generate the human attributes by incorporating the visual features and attribute query tokens. The masked multi-head attention layer is introduced into the decoder module to prevent the model from remembering the next attribute while making attribute predictions during training. Extensive experiments on multiple widely used pedestrian attribute recognition datasets fully validated the effectiveness of our proposed SequencePAR. The source code and pre-trained models will be released at https://github.com/Event-AHU/OpenPAR.

* In Peer Review

Via

Access Paper or Ask Questions

Visual Object Tracking by Segmentation with Graph Convolutional Network

Sep 08, 2020

Bo Jiang, Panpan Zhang, Lili Huang

Figure 1 for Visual Object Tracking by Segmentation with Graph Convolutional Network

Figure 2 for Visual Object Tracking by Segmentation with Graph Convolutional Network

Figure 3 for Visual Object Tracking by Segmentation with Graph Convolutional Network

Figure 4 for Visual Object Tracking by Segmentation with Graph Convolutional Network

Abstract:Segmentation-based tracking has been actively studied in computer vision and multimedia. Superpixel based object segmentation and tracking methods are usually developed for this task. However, they independently perform feature representation and learning of superpixels which may lead to sub-optimal results. In this paper, we propose to utilize graph convolutional network (GCN) model for superpixel based object tracking. The proposed model provides a general end-to-end framework which integrates i) label linear prediction, and ii) structure-aware feature information of each superpixel together to obtain object segmentation and further improves the performance of tracking. The main benefits of the proposed GCN method have two main aspects. First, it provides an effective end-to-end way to exploit both spatial and temporal consistency constraint for target object segmentation. Second, it utilizes a mixed graph convolution module to learn a context-aware and discriminative feature for superpixel representation and labeling. An effective algorithm has been developed to optimize the proposed model. Extensive experiments on five datasets demonstrate that our method obtains better performance against existing alternative methods.

Via

Access Paper or Ask Questions

Learning Deep Representations for Semantic Image Parsing: a Comprehensive Overview

Oct 10, 2018

Lili Huang, Jiefeng Peng, Ruimao Zhang, Guanbin Li, Liang Lin

Figure 1 for Learning Deep Representations for Semantic Image Parsing: a Comprehensive Overview

Figure 2 for Learning Deep Representations for Semantic Image Parsing: a Comprehensive Overview

Figure 3 for Learning Deep Representations for Semantic Image Parsing: a Comprehensive Overview

Figure 4 for Learning Deep Representations for Semantic Image Parsing: a Comprehensive Overview

Abstract:Semantic image parsing, which refers to the process of decomposing images into semantic regions and constructing the structure representation of the input, has recently aroused widespread interest in the field of computer vision. The recent application of deep representation learning has driven this field into a new stage of development. In this paper, we summarize three aspects of the progress of research on semantic image parsing, i.e., category-level semantic segmentation, instance-level semantic segmentation, and beyond segmentation. Specifically, we first review the general frameworks for each task and introduce the relevant variants. The advantages and limitations of each method are also discussed. Moreover, we present a comprehensive comparison of different benchmark datasets and evaluation metrics. Finally, we explore the future trends and challenges of semantic image parsing.

Via

Access Paper or Ask Questions

Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction

Jul 15, 2017

Liang Lin, Lili Huang, Tianshui Chen, Yukang Gan, Hui Cheng

Figure 1 for Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction

Figure 2 for Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction

Figure 3 for Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction

Figure 4 for Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction

Abstract:This paper aims at task-oriented action prediction, i.e., predicting a sequence of actions towards accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The main challenges lie in how to model task-specific knowledge and integrate it in the learning procedure. In this work, we propose to train a recurrent long-short term memory (LSTM) network for handling this problem, i.e., taking a scene image (including pre-located objects) and the specified task as input and recurrently predicting action sequences. However, training such a network usually requires large amounts of annotated samples for covering the semantic space (e.g., diverse action decomposition and ordering). To alleviate this issue, we introduce a temporal And-Or graph (AOG) for task description, which hierarchically represents a task into atomic actions. With this AOG representation, we can produce many valid samples (i.e., action sequences according with common sense) by training another auxiliary LSTM network with a small set of annotated samples. And these generated samples (i.e., task-oriented action sequences) effectively facilitate training the model for task-oriented action prediction. In the experiments, we create a new dataset containing diverse daily tasks and extensively evaluate the effectiveness of our approach.

Via

Access Paper or Ask Questions