Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huaqi Zhang

HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions

May 29, 2025

Shuolin Xu, Siming Zheng, Ziyi Wang, HC Yu, Jinwei Chen, Huaqi Zhang, Bo Li, Peng-Tao Jiang

Abstract:Recent advances in diffusion models have significantly improved conditional video generation, particularly in the pose-guided human image animation task. Although existing methods are capable of generating high-fidelity and time-consistent animation sequences in regular motions and static scenes, there are still obvious limitations when facing complex human body motions (Hypermotion) that contain highly dynamic, non-standard motions, and the lack of a high-quality benchmark for evaluation of complex human motion animations. To address this challenge, we introduce the \textbf{Open-HyperMotionX Dataset} and \textbf{HyperMotionX Bench}, which provide high-quality human pose annotations and curated video clips for evaluating and improving pose-guided human image animation models under complex human motion conditions. Furthermore, we propose a simple yet powerful DiT-based video generation baseline and design spatial low-frequency enhanced RoPE, a novel module that selectively enhances low-frequency spatial feature modeling by introducing learnable frequency scaling. Our method significantly improves structural stability and appearance consistency in highly dynamic human motion sequences. Extensive experiments demonstrate the effectiveness of our dataset and proposed approach in advancing the generation quality of complex human motion image animations. Code and dataset will be made publicly available.

* 17 pages, 7 figures

Via

Access Paper or Ask Questions

CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

Dec 17, 2024

Gaoyang Zhang, Bingtao Fu, Qingnan Fan, Qi Zhang, Runxing Liu, Hong Gu, Huaqi Zhang, Xinguo Liu

Figure 1 for CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

Figure 2 for CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

Figure 3 for CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

Figure 4 for CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

Abstract:Text-to-image diffusion models excel at generating photorealistic images, but commonly struggle to render accurate spatial relationships described in text prompts. We identify two core issues underlying this common failure: 1) the ambiguous nature of spatial-related data in existing datasets, and 2) the inability of current text encoders to accurately interpret the spatial semantics of input descriptions. We address these issues with CoMPaSS, a versatile training framework that enhances spatial understanding of any T2I diffusion model. CoMPaSS solves the ambiguity of spatial-related data with the Spatial Constraints-Oriented Pairing (SCOP) data engine, which curates spatially-accurate training data through a set of principled spatial constraints. To better exploit the curated high-quality spatial priors, CoMPaSS further introduces a Token ENcoding ORdering (TENOR) module to allow better exploitation of high-quality spatial priors, effectively compensating for the shortcoming of text encoders. Extensive experiments on four popular open-weight T2I diffusion models covering both UNet- and MMDiT-based architectures demonstrate the effectiveness of CoMPaSS by setting new state-of-the-arts with substantial relative gains across well-known benchmarks on spatial relationships generation, including VISOR (+98%), T2I-CompBench Spatial (+67%), and GenEval Position (+131%). Code will be available at https://github.com/blurgyy/CoMPaSS.

* 18 pages, 11 figures

Via

Access Paper or Ask Questions

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

May 16, 2024

Chengxiang Fan, Muzhi Zhu, Hao Chen, Yang Liu, Weijia Wu, Huaqi Zhang, Chunhua Shen

Abstract:Instance segmentation is data-hungry, and as model capacity increases, data scale becomes crucial for improving the accuracy. Most instance segmentation datasets today require costly manual annotation, limiting their data scale. Models trained on such data are prone to overfitting on the training set, especially for those rare categories. While recent works have delved into exploiting generative models to create synthetic datasets for data augmentation, these approaches do not efficiently harness the full potential of generative models. To address these issues, we introduce a more efficient strategy to construct generative datasets for data augmentation, termed DiverGen. Firstly, we provide an explanation of the role of generative data from the perspective of distribution discrepancy. We investigate the impact of different data on the distribution learned by the model. We argue that generative data can expand the data distribution that the model can learn, thus mitigating overfitting. Additionally, we find that the diversity of generative data is crucial for improving model performance and enhance it through various strategies, including category diversity, prompt diversity, and generative model diversity. With these strategies, we can scale the data to millions while maintaining the trend of model performance improvement. On the LVIS dataset, DiverGen significantly outperforms the strong model X-Paste, achieving +1.1 box AP and +1.1 mask AP across all categories, and +1.9 box AP and +2.5 mask AP for rare categories.

* Accepted to CVPR 2024, codes are available at \href{this https URL}{https://github.com/aim-uofa/DiverGen}

Via

Access Paper or Ask Questions

Guided Colorization Using Mono-Color Image Pairs

Aug 17, 2021

Ze-Hua Sheng, Hui-Liang Shen, Bo-Wen Yao, Huaqi Zhang

Figure 1 for Guided Colorization Using Mono-Color Image Pairs

Figure 2 for Guided Colorization Using Mono-Color Image Pairs

Figure 3 for Guided Colorization Using Mono-Color Image Pairs

Figure 4 for Guided Colorization Using Mono-Color Image Pairs

Abstract:Compared to color images captured by conventional RGB cameras, monochrome images usually have better signal-to-noise ratio (SNR) and richer textures due to its higher quantum efficiency. It is thus natural to apply a mono-color dual-camera system to restore color images with higher visual quality. In this paper, we propose a mono-color image enhancement algorithm that colorizes the monochrome image with the color one. Based on the assumption that adjacent structures with similar luminance values are likely to have similar colors, we first perform dense scribbling to assign colors to the monochrome pixels through block matching. Two types of outliers, including occlusion and color ambiguity, are detected and removed from the initial scribbles. We also introduce a sampling strategy to accelerate the scribbling process. Then, the dense scribbles are propagated to the entire image. To alleviate incorrect color propagation in the regions that have no color hints at all, we generate extra color seeds based on the existed scribbles to guide the propagation process. Experimental results show that, our algorithm can efficiently restore color images with higher SNR and richer details from the mono-color image pairs, and achieves good performance in solving the color bleeding problem.

Via

Access Paper or Ask Questions