Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Enbo Huang

DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images

Dec 25, 2024

Enbo Huang, Yuan Zhang, Faliang Huang, Guangyu Zhang, Yang Liu

Abstract:Person image synthesis with controllable body poses and appearances is an essential task owing to the practical needs in the context of virtual try-on, image editing and video production. However, existing methods face significant challenges with details missing, limbs distortion and the garment style deviation. To address these issues, we propose a Disentangled Representations Diffusion Model (DRDM) to generate photo-realistic images from source portraits in specific desired poses and appearances. First, a pose encoder is responsible for encoding pose features into a high-dimensional space to guide the generation of person images. Second, a body-part subspace decoupling block (BSDB) disentangles features from the different body parts of a source figure and feeds them to the various layers of the noise prediction block, thereby supplying the network with rich disentangled features for generating a realistic target image. Moreover, during inference, we develop a parsing map-based disentangled classifier-free guided sampling method, which amplifies the conditional signals of texture and pose. Extensive experimental results on the Deepfashion dataset demonstrate the effectiveness of our approach in achieving pose transfer and appearance control.

Via

Access Paper or Ask Questions

VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis

Dec 24, 2024

Shicheng Yin, Kaixuan Yin, Weixing Chen, Enbo Huang, Yang Liu

Figure 1 for VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis

Figure 2 for VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis

Figure 3 for VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis

Figure 4 for VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis

Abstract:Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) are two dominant models for image analysis. While CNNs excel at extracting multi-scale features and ViTs effectively capture global dependencies, both suffer from high computational costs, particularly when processing high-resolution images. Recently, state-space models (SSMs) and recurrent neural networks (RNNs) have attracted attention due to their efficiency. However, their performance in image classification tasks remains limited. To address these challenges, this paper introduces VisionGRU, a novel RNN-based architecture designed for efficient image classification. VisionGRU leverages a simplified Gated Recurrent Unit (minGRU) to process large-scale image features with linear complexity. It divides images into smaller patches and progressively reduces the sequence length while increasing the channel depth, thus facilitating multi-scale feature extraction. A hierarchical 2DGRU module with bidirectional scanning captures both local and global contexts, improving long-range dependency modeling, particularly for tasks like semantic segmentation. Experimental results on the ImageNet and ADE20K datasets demonstrate that VisionGRU outperforms ViTs, significantly reducing memory usage and computational costs, especially for high-resolution images. These findings underscore the potential of RNN-based approaches for developing efficient and scalable computer vision solutions. Codes will be available at https://github.com/YangLiu9208/VisionGRU.

* Codes will be available at https://github.com/YangLiu9208/VisionGRU

Via

Access Paper or Ask Questions