Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pengming Feng

Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation

Mar 21, 2025

Congyi Fan, Jian Guan, Xuanjia Zhao, Dongli Xu, Youtian Lin, Tong Ye, Pengming Feng, Haiwei Pan

Abstract:Automatically generating natural, diverse and rhythmic human dance movements driven by music is vital for virtual reality and film industries. However, generating dance that naturally follows music remains a challenge, as existing methods lack proper beat alignment and exhibit unnatural motion dynamics. In this paper, we propose Danceba, a novel framework that leverages gating mechanism to enhance rhythm-aware feature representation for music-driven dance generation, which achieves highly aligned dance poses with enhanced rhythmic sensitivity. Specifically, we introduce Phase-Based Rhythm Extraction (PRE) to precisely extract rhythmic information from musical phase data, capitalizing on the intrinsic periodicity and temporal structures of music. Additionally, we propose Temporal-Gated Causal Attention (TGCA) to focus on global rhythmic features, ensuring that dance movements closely follow the musical rhythm. We also introduce Parallel Mamba Motion Modeling (PMMM) architecture to separately model upper and lower body motions along with musical features, thereby improving the naturalness and diversity of generated dance movements. Extensive experiments confirm that Danceba outperforms state-of-the-art methods, achieving significantly better rhythmic alignment and motion diversity. Project page: https://danceba.github.io/ .

* 10 pages, 6 figures

Via

Access Paper or Ask Questions

Band Prompting Aided SAR and Multi-Spectral Data Fusion Framework for Local Climate Zone Classification

Dec 24, 2024

Haiyan Lan, Shujun Li, Mingjie Xie, Xuanjia Zhao, Hongning Liu, Pengming Feng, Dongli Xu, Guangjun He, Jian Guan

Abstract:Local climate zone (LCZ) classification is of great value for understanding the complex interactions between urban development and local climate. Recent studies have increasingly focused on the fusion of synthetic aperture radar (SAR) and multi-spectral data to improve LCZ classification performance. However, it remains challenging due to the distinct physical properties of these two types of data and the absence of effective fusion guidance. In this paper, a novel band prompting aided data fusion framework is proposed for LCZ classification, namely BP-LCZ, which utilizes textual prompts associated with band groups to guide the model in learning the physical attributes of different bands and semantics of various categories inherent in SAR and multi-spectral data to augment the fused feature, thus enhancing LCZ classification performance. Specifically, a band group prompting (BGP) strategy is introduced to align the visual representation effectively at the level of band groups, which also facilitates a more adequate extraction of semantic information of different bands with textual information. In addition, a multivariate supervised matrix (MSM) based training strategy is proposed to alleviate the problem of positive and negative sample confusion by completing the supervised information. The experimental results demonstrate the effectiveness and superiority of the proposed data fusion framework.

* Accepted by ICASSP 2025

Via

Access Paper or Ask Questions

FastDrag: Manipulate Anything in One Step

May 24, 2024

Xuanjia Zhao, Jian Guan, Congyi Fan, Dongli Xu, Youtian Lin, Haiwei Pan, Pengming Feng

Abstract:Drag-based image editing using generative models provides precise control over image contents, enabling users to manipulate anything in an image with a few clicks. However, prevailing methods typically adopt $n$-step iterations for latent semantic optimization to achieve drag-based image editing, which is time-consuming and limits practical applications. In this paper, we introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process. Central to our approach is a latent warpage function (LWF), which simulates the behavior of a stretched material to adjust the location of individual pixels within the latent space. This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds. Meanwhile, null regions emerging after applying LWF are addressed by our proposed bilateral nearest neighbor interpolation (BNNI) strategy. This strategy interpolates these regions using similar features from neighboring areas, thus enhancing semantic integrity. Additionally, a consistency-preserving strategy is introduced to maintain the consistency between the edited and original images by adopting semantic information from the original image, saved as key and value pairs in self-attention module during diffusion inversion, to guide the diffusion sampling. Our FastDrag is validated on the DragBench dataset, demonstrating substantial improvements in processing time over existing methods, while achieving enhanced editing performance.

* 13 pages

Via

Access Paper or Ask Questions

SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images

Feb 06, 2024

Pengming Feng, Mingjie Xie, Hongning Liu, Xuanjia Zhao, Guangjun He, Xueliang Zhang, Jian Guan

Abstract:Fine-grained ship instance segmentation in satellite images holds considerable significance for monitoring maritime activities at sea. However, existing datasets often suffer from the scarcity of fine-grained information or pixel-wise localization annotations, as well as the insufficient image diversity and variations, thus limiting the research of this task. To this end, we propose a benchmark dataset for fine-grained Ship Instance Segmentation in Panchromatic satellite images, namely SISP, which contains 56,693 well-annotated ship instances with four fine-grained categories across 10,000 sliced images, and all the images are collected from SuperView-1 satellite with the resolution of 0.5m. Targets in the proposed SISP dataset have characteristics that are consistent with real satellite scenes, such as high class imbalance, various scenes, large variations in target densities and scales, and high inter-class similarity and intra-class diversity, all of which make the SISP dataset more suitable for real-world applications. In addition, we introduce a Dynamic Feature Refinement-assist Instance segmentation network, namely DFRInst, as the benchmark method for ship instance segmentation in satellite images, which can fortify the explicit representation of crucial features, thus improving the performance of ship instance segmentation. Experiments and analysis are performed on the proposed SISP dataset to evaluate the benchmark method and several state-of-the-art methods to establish baselines for facilitating future research. The proposed dataset and source codes will be available at: https://github.com/Justlovesmile/SISP.

* 14 pages, 9 figures

Via

Access Paper or Ask Questions

EARL: An Elliptical Distribution aided Adaptive Rotation Label Assignment for Oriented Object Detection in Remote Sensing Images

Jan 14, 2023

Jian Guan, Mingjie Xie, Youtian Lin, Guangjun He, Pengming Feng

Figure 1 for EARL: An Elliptical Distribution aided Adaptive Rotation Label Assignment for Oriented Object Detection in Remote Sensing Images

Figure 2 for EARL: An Elliptical Distribution aided Adaptive Rotation Label Assignment for Oriented Object Detection in Remote Sensing Images

Figure 3 for EARL: An Elliptical Distribution aided Adaptive Rotation Label Assignment for Oriented Object Detection in Remote Sensing Images

Figure 4 for EARL: An Elliptical Distribution aided Adaptive Rotation Label Assignment for Oriented Object Detection in Remote Sensing Images

Abstract:Label assignment is often employed in recent convolutional neural network (CNN) based detectors to determine positive or negative samples during training process. However, we note that current label assignment strategies barely consider the characteristics of targets in remote sensing images thoroughly, such as large variations in orientations, aspect ratios and scales, which lead to insufficient sampling. In this paper, an Elliptical Distribution aided Adaptive Rotation Label Assignment (EARL) is proposed to select positive samples with higher quality in orientation detectors, and yields better performance. Concretely, to avoid inadequate sampling of targets with extreme scales, an adaptive scale sampling (ADS) strategy is proposed to dynamically select samples on different feature levels according to the scales of targets. To enhance ADS, positive samples are selected following a dynamic elliptical distribution (DED), which can further exploit the orientation and shape properties of targets. Moreover, a spatial distance weighting (SDW) module is introduced to mitigate the influence from low-quality samples on detection performance. Extensive experiments on popular remote sensing datasets, such as DOTA and HRSC2016, demonstrate the effectiveness and the superiority of our proposed EARL, where without bells and whistles, it achieves 72.87 of mAP on DOTA dataset by being integrated with simple structure, which outperforms current state-of-the-art anchor-free detectors and provides comparable performance as anchor-based methods. The source code will be available at https://github.com/Justlovesmile/EARL

Via

Access Paper or Ask Questions

Low-dimensional Denoising Embedding Transformer for ECG Classification

Mar 31, 2021

Jian Guan, Wenbo Wang, Pengming Feng, Xinxin Wang, Wenwu Wang

Figure 1 for Low-dimensional Denoising Embedding Transformer for ECG Classification

Figure 2 for Low-dimensional Denoising Embedding Transformer for ECG Classification

Figure 3 for Low-dimensional Denoising Embedding Transformer for ECG Classification

Figure 4 for Low-dimensional Denoising Embedding Transformer for ECG Classification

Abstract:The transformer based model (e.g., FusingTF) has been employed recently for Electrocardiogram (ECG) signal classification. However, the high-dimensional embedding obtained via 1-D convolution and positional encoding can lead to the loss of the signal's own temporal information and a large amount of training parameters. In this paper, we propose a new method for ECG classification, called low-dimensional denoising embedding transformer (LDTF), which contains two components, i.e., low-dimensional denoising embedding (LDE) and transformer learning. In the LDE component, a low-dimensional representation of the signal is obtained in the time-frequency domain while preserving its own temporal information. And with the low dimensional embedding, the transformer learning is then used to obtain a deeper and narrower structure with fewer training parameters than that of the FusingTF. Experiments conducted on the MIT-BIH dataset demonstrates the effectiveness and the superior performance of our proposed method, as compared with state-of-the-art methods.

* To appear at ICASSP 2021

Via

Access Paper or Ask Questions

A Practical Solution for SAR Despeckling with Only Single Speckled Images

Dec 13, 2019

Ye Yuan, Jianguo Sun, Jian Guan, Pengming Feng, Yanxia Wu

Figure 1 for A Practical Solution for SAR Despeckling with Only Single Speckled Images

Figure 2 for A Practical Solution for SAR Despeckling with Only Single Speckled Images

Figure 3 for A Practical Solution for SAR Despeckling with Only Single Speckled Images

Figure 4 for A Practical Solution for SAR Despeckling with Only Single Speckled Images

Abstract:In this letter, we aim to address synthetic aperture radar (SAR) despeckling problem with the necessity of neither clean (speckle-free) SAR images nor independent speckled image pairs from the same scene, a practical solution for SAR despeckling (PSD) is proposed. Firstly, to generate speckled-to-speckled (S2S) image pairs from the same scene in the situation of only single speckled SAR images are available, an adversarial learning framework is designed. Then, the S2S SAR image pairs are employed to train a modified despeckling Nested-UNet model using the Noise2Noise (N2N) strategy. Moreover, an iterative version of the PSD method (PSDi) is also proposed. The performance of the proposed methods is demonstrated by both synthetic speckled and real SAR data. SAR block-matching 3-D algorithm (SAR-BM3D) and SAR dilated residual network (SAR-DRN) are used in the visual and quantitative comparison. Experimental results show that the proposed methods can reach a good tradeoff between speckle suppression and edge preservation.

* 5 pages, 5 figures

Via

Access Paper or Ask Questions

IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection

Dec 02, 2019

Youtian Lin, Pengming Feng, Jian Guan

Figure 1 for IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection

Figure 2 for IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection

Figure 3 for IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection

Figure 4 for IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection

Abstract:Object detection in aerial images is a challenging task due to its lack of visiable features and variant orientation of objects. Currently, amount of R-CNN framework based detectors have made significant progress in predicting targets by horizontal bounding boxes (HBB) and oriented bounding boxes (OBB). However, there is still open space for one-stage anchor free solutions. This paper proposes a one-stage anchor free detector for orientional object in aerial images, which is built upon a per-pixel prediction fashion detector. We make it possible by developing a branch interacting module with a self-attention mechanism to fuse features from classification and box regression branchs. Moreover a geometric transformation is employed in angle prediction to make it more manageable for the prediction network. We also introduce an IOU loss for OBB detection, which is more efficient than regular polygon IOU. The propsed method is evaluated on DOTA and HRSC2016 datasets, and the outcomes show the higher OBB detection performance from our propsed IENet when compared with the state-of-the-art detectors.

Via

Access Paper or Ask Questions