Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qiuxia Lai

PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation

May 26, 2025

Hongsong Wang, Yin Zhu, Qiuxia Lai, Yang Zhang, Guo-Sen Xie, Xin Geng

Abstract:Computational dance generation is crucial in many areas, such as art, human-computer interaction, virtual reality, and digital entertainment, particularly for generating coherent and expressive long dance sequences. Diffusion-based music-to-dance generation has made significant progress, yet existing methods still struggle to produce physically plausible motions. To address this, we propose Plausibility-Aware Motion Diffusion (PAMD), a framework for generating dances that are both musically aligned and physically realistic. The core of PAMD lies in the Plausible Motion Constraint (PMC), which leverages Neural Distance Fields (NDFs) to model the actual pose manifold and guide generated motions toward a physically valid pose manifold. To provide more effective guidance during generation, we incorporate Prior Motion Guidance (PMG), which uses standing poses as auxiliary conditions alongside music features. To further enhance realism for complex movements, we introduce the Motion Refinement with Foot-ground Contact (MRFC) module, which addresses foot-skating artifacts by bridging the gap between the optimization objective in linear joint position space and the data representation in nonlinear rotation space. Extensive experiments show that PAMD significantly improves musical alignment and enhances the physical plausibility of generated motions. This project page is available at: https://mucunzhuzhu.github.io/PAMD-page/.

* This project page is available at: https://mucunzhuzhu.github.io/PAMD-page/

Via

Access Paper or Ask Questions

Vector Quantization Prompting for Continual Learning

Oct 27, 2024

Li Jiao, Qiuxia Lai, Yu Li, Qiang Xu

Figure 1 for Vector Quantization Prompting for Continual Learning

Figure 2 for Vector Quantization Prompting for Continual Learning

Figure 3 for Vector Quantization Prompting for Continual Learning

Figure 4 for Vector Quantization Prompting for Continual Learning

Abstract:Continual learning requires to overcome catastrophic forgetting when training a single model on a sequence of tasks. Recent top-performing approaches are prompt-based methods that utilize a set of learnable parameters (i.e., prompts) to encode task knowledge, from which appropriate ones are selected to guide the fixed pre-trained model in generating features tailored to a certain task. However, existing methods rely on predicting prompt identities for prompt selection, where the identity prediction process cannot be optimized with task loss. This limitation leads to sub-optimal prompt selection and inadequate adaptation of pre-trained features for a specific task. Previous efforts have tried to address this by directly generating prompts from input queries instead of selecting from a set of candidates. However, these prompts are continuous, which lack sufficient abstraction for task knowledge representation, making them less effective for continual learning. To address these challenges, we propose VQ-Prompt, a prompt-based continual learning method that incorporates Vector Quantization (VQ) into end-to-end training of a set of discrete prompts. In this way, VQ-Prompt can optimize the prompt selection process with task loss and meanwhile achieve effective abstraction of task knowledge for continual learning. Extensive experiments show that VQ-Prompt outperforms state-of-the-art continual learning methods across a variety of benchmarks under the challenging class-incremental setting. The code is available at \href{https://github.com/jiaolifengmi/VQ-Prompt}{this https URL}.

* To appear in NeurIPS 2024

Via

Access Paper or Ask Questions

Poly Kernel Inception Network for Remote Sensing Detection

Mar 20, 2024

Xinhao Cai, Qiuxia Lai, Yuwei Wang, Wenguan Wang, Zeren Sun, Yazhou Yao

Abstract:Object detection in remote sensing images (RSIs) often suffers from several increasing challenges, including the large variation in object scales and the diverse-ranging context. Prior methods tried to address these challenges by expanding the spatial receptive field of the backbone, either through large-kernel convolution or dilated convolution. However, the former typically introduces considerable background noise, while the latter risks generating overly sparse feature representations. In this paper, we introduce the Poly Kernel Inception Network (PKINet) to handle the above challenges. PKINet employs multi-scale convolution kernels without dilation to extract object features of varying scales and capture local context. In addition, a Context Anchor Attention (CAA) module is introduced in parallel to capture long-range contextual information. These two components work jointly to advance the performance of PKINet on four challenging remote sensing detection benchmarks, namely DOTA-v1.0, DOTA-v1.5, HRSC2016, and DIOR-R.

* accepted by IEEE Conference on Computer Vision and Pattern Recognition, 2024

Via

Access Paper or Ask Questions

Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis

Mar 08, 2024

Muxi Chen, Yi Liu, Jian Yi, Changran Xu, Qiuxia Lai, Hongliang Wang, Tsung-Yi Ho, Qiang Xu

Abstract:In this paper, we present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models, applied to human image synthesis. Our framework categorizes evaluations into two distinct groups: first, focusing on image qualities such as aesthetics and realism, and second, examining text conditions through concept coverage and fairness. We introduce an innovative aesthetic score prediction model that assesses the visual appeal of generated images and unveils the first dataset marked with low-quality regions in generated human images to facilitate automatic defect detection. Our exploration into concept coverage probes the model's effectiveness in interpreting and rendering text-based concepts accurately, while our analysis of fairness reveals biases in model outputs, with an emphasis on gender, race, and age. While our study is grounded in human imagery, this dual-faceted approach is designed with the flexibility to be applicable to other forms of image generation, enhancing our understanding of generative models and paving the way to the next generation of more sophisticated, contextually aware, and ethically attuned generative models. We will release our code, the data used for evaluating generative models and the dataset annotated with defective areas soon.

Via

Access Paper or Ask Questions

BIFRNet: A Brain-Inspired Feature Restoration DNN for Partially Occluded Image Recognition

Mar 16, 2023

Jiahong Zhang, Lihong Cao, Qiuxia Lai, Binyao Li, Yunxiao Qin

Figure 1 for BIFRNet: A Brain-Inspired Feature Restoration DNN for Partially Occluded Image Recognition

Figure 2 for BIFRNet: A Brain-Inspired Feature Restoration DNN for Partially Occluded Image Recognition

Figure 3 for BIFRNet: A Brain-Inspired Feature Restoration DNN for Partially Occluded Image Recognition

Figure 4 for BIFRNet: A Brain-Inspired Feature Restoration DNN for Partially Occluded Image Recognition

Abstract:The partially occluded image recognition (POIR) problem has been a challenge for artificial intelligence for a long time. A common strategy to handle the POIR problem is using the non-occluded features for classification. Unfortunately, this strategy will lose effectiveness when the image is severely occluded, since the visible parts can only provide limited information. Several studies in neuroscience reveal that feature restoration which fills in the occluded information and is called amodal completion is essential for human brains to recognize partially occluded images. However, feature restoration is commonly ignored by CNNs, which may be the reason why CNNs are ineffective for the POIR problem. Inspired by this, we propose a novel brain-inspired feature restoration network (BIFRNet) to solve the POIR problem. It mimics a ventral visual pathway to extract image features and a dorsal visual pathway to distinguish occluded and visible image regions. In addition, it also uses a knowledge module to store object prior knowledge and uses a completion module to restore occluded features based on visible features and prior knowledge. Thorough experiments on synthetic and real-world occluded image datasets show that BIFRNet outperforms the existing methods in solving the POIR problem. Especially for severely occluded images, BIRFRNet surpasses other methods by a large margin and is close to the human brain performance. Furthermore, the brain-inspired design makes BIFRNet more interpretable.

* This paper has been accepted by AAAI-2023

Via

Access Paper or Ask Questions

DeepSAT: An EDA-Driven Learning Framework for SAT

May 27, 2022

Min Li, Zhengyuan Shi, Qiuxia Lai, Sadaf Khan, Qiang Xu

Figure 1 for DeepSAT: An EDA-Driven Learning Framework for SAT

Figure 2 for DeepSAT: An EDA-Driven Learning Framework for SAT

Figure 3 for DeepSAT: An EDA-Driven Learning Framework for SAT

Figure 4 for DeepSAT: An EDA-Driven Learning Framework for SAT

Abstract:We present DeepSAT, a novel end-to-end learning framework for the Boolean satisfiability (SAT) problem. Unlike existing solutions trained on random SAT instances with relatively weak supervisions, we propose applying the knowledge of the well-developed electronic design automation (EDA) field for SAT solving. Specifically, we first resort to advanced logic synthesis algorithms to pre-process SAT instances into optimized and-inverter graphs (AIGs). By doing so, our training and test sets have a unified distribution, thus the learned model can generalize well to test sets of various sources of SAT instances. Next, we regard the distribution of SAT solutions being a product of conditional Bernoulli distributions. Based on this observation, we approximate the SAT solving procedure with a conditional generative model, leveraging a directed acyclic graph neural network with two polarity prototypes for conditional SAT modeling. To effectively train the generative model, with the help of logic simulation tools, we obtain the probabilities of nodes in the AIG being logic '1' as rich supervision. We conduct extensive experiments on various SAT instances. DeepSAT achieves significant accuracy improvements over state-of-the-art learning-based SAT solutions, especially when generalized to SAT instances that are large or with diverse distributions.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

What You See is Not What the Network Infers: Detecting Adversarial Examples Based on Semantic Contradiction

Jan 24, 2022

Yijun Yang, Ruiyuan Gao, Yu Li, Qiuxia Lai, Qiang Xu

Figure 1 for What You See is Not What the Network Infers: Detecting Adversarial Examples Based on Semantic Contradiction

Figure 2 for What You See is Not What the Network Infers: Detecting Adversarial Examples Based on Semantic Contradiction

Figure 3 for What You See is Not What the Network Infers: Detecting Adversarial Examples Based on Semantic Contradiction

Figure 4 for What You See is Not What the Network Infers: Detecting Adversarial Examples Based on Semantic Contradiction

Abstract:Adversarial examples (AEs) pose severe threats to the applications of deep neural networks (DNNs) to safety-critical domains, e.g., autonomous driving. While there has been a vast body of AE defense solutions, to the best of our knowledge, they all suffer from some weaknesses, e.g., defending against only a subset of AEs or causing a relatively high accuracy loss for legitimate inputs. Moreover, most existing solutions cannot defend against adaptive attacks, wherein attackers are knowledgeable about the defense mechanisms and craft AEs accordingly. In this paper, we propose a novel AE detection framework based on the very nature of AEs, i.e., their semantic information is inconsistent with the discriminative features extracted by the target DNN model. To be specific, the proposed solution, namely ContraNet, models such contradiction by first taking both the input and the inference result to a generator to obtain a synthetic output and then comparing it against the original input. For legitimate inputs that are correctly inferred, the synthetic output tries to reconstruct the input. On the contrary, for AEs, instead of reconstructing the input, the synthetic output would be created to conform to the wrong label whenever possible. Consequently, by measuring the distance between the input and the synthetic output with metric learning, we can differentiate AEs from legitimate inputs. We perform comprehensive evaluations under various AE attack scenarios, and experimental results show that ContraNet outperforms existing solutions by a large margin, especially under adaptive attacks. Moreover, our analysis shows that successful AEs that can bypass ContraNet tend to have much-weakened adversarial semantics. We have also shown that ContraNet can be easily combined with adversarial training techniques to achieve further improved AE defense capabilities.

* Accepted to NDSS 2022. Camera-ready version with supplementary materials. Code is available at https://github.com/cure-lab/ContraNet.git

Via

Access Paper or Ask Questions

Information Bottleneck Approach to Spatial Attention Learning

Aug 07, 2021

Qiuxia Lai, Yu Li, Ailing Zeng, Minhao Liu, Hanqiu Sun, Qiang Xu

Figure 1 for Information Bottleneck Approach to Spatial Attention Learning

Figure 2 for Information Bottleneck Approach to Spatial Attention Learning

Figure 3 for Information Bottleneck Approach to Spatial Attention Learning

Figure 4 for Information Bottleneck Approach to Spatial Attention Learning

Abstract:The selective visual attention mechanism in the human visual system (HVS) restricts the amount of information to reach visual awareness for perceiving natural scenes, allowing near real-time information processing with limited computational capacity [Koch and Ullman, 1987]. This kind of selectivity acts as an 'Information Bottleneck (IB)', which seeks a trade-off between information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). In this paper, we propose an IB-inspired spatial attention module for DNN structures built for visual recognition. The module takes as input an intermediate representation of the input image, and outputs a variational 2D attention map that minimizes the mutual information (MI) between the attention-modulated representation and the input, while maximizing the MI between the attention-modulated representation and the task label. To further restrict the information bypassed by the attention map, we quantize the continuous attention scores to a set of learnable anchor values during training. Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e.g., image classification, fine-grained recognition, cross-domain classification). The attention maps are interpretable for the decision making of the DNNs as verified in the experiments. Our code is available at https://github.com/ashleylqx/AIB.git.

* To appear in IJCAI 2021; with supplymentary

Via

Access Paper or Ask Questions

Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction

Jun 17, 2021

Minhao Liu, Ailing Zeng, Qiuxia Lai, Qiang Xu

Figure 1 for Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction

Figure 2 for Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction

Figure 3 for Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction

Figure 4 for Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction

Abstract:Time series is a special type of sequence data, a set of observations collected at even intervals of time and ordered chronologically. Existing deep learning techniques use generic sequence models (e.g., recurrent neural network, Transformer model, or temporal convolutional network) for time series analysis, which ignore some of its unique properties. For example, the downsampling of time series data often preserves most of the information in the data, while this is not true for general sequence data such as text sequence and DNA sequence. Motivated by the above, in this paper, we propose a novel neural network architecture and apply it for the time series forecasting problem, wherein we conduct sample convolution and interaction at multiple resolutions for temporal modeling. The proposed architecture, namelySCINet, facilitates extracting features with enhanced predictability. Experimental results show that SCINet achieves significant prediction accuracy improvement over existing solutions across various real-world time series forecasting datasets. In particular, it can achieve high fore-casting accuracy for those temporal-spatial datasets without using sophisticated spatial modeling techniques. Our codes and data are presented in the supplemental material.

Via

Access Paper or Ask Questions

TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

May 21, 2021

Yu Li, Min Li, Qiuxia Lai, Yannan Liu, Qiang Xu

Figure 1 for TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

Figure 2 for TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

Figure 3 for TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

Figure 4 for TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

Abstract:Deep learning (DL) has achieved unprecedented success in a variety of tasks. However, DL systems are notoriously difficult to test and debug due to the lack of explainability of DL models and the huge test input space to cover. Generally speaking, it is relatively easy to collect a massive amount of test data, but the labeling cost can be quite high. Consequently, it is essential to conduct test selection and label only those selected "high quality" bug-revealing test inputs for test cost reduction. In this paper, we propose a novel test prioritization technique that brings order into the unlabeled test instances according to their bug-revealing capabilities, namely TestRank. Different from existing solutions, TestRank leverages both intrinsic attributes and contextual attributes of test instances when prioritizing them. To be specific, we first build a similarity graph on test instances and training samples, and we conduct graph-based semi-supervised learning to extract contextual features. Then, for a particular test instance, the contextual features extracted from the graph neural network (GNN) and the intrinsic features obtained with the DL model itself are combined to predict its bug-revealing probability. Finally, TestRank prioritizes unlabeled test instances in descending order of the above probability value. We evaluate the performance of TestRank on a variety of image classification datasets. Experimental results show that the debugging efficiency of our method significantly outperforms existing test prioritization techniques.

Via

Access Paper or Ask Questions