Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenqi Liang

GLAM: Global-Local Variation Awareness in Mamba-based World Model

Jan 21, 2025

Qian He, Wenqi Liang, Chunhui Hao, Gan Sun, Jiandong Tian

Figure 1 for GLAM: Global-Local Variation Awareness in Mamba-based World Model

Figure 2 for GLAM: Global-Local Variation Awareness in Mamba-based World Model

Figure 3 for GLAM: Global-Local Variation Awareness in Mamba-based World Model

Figure 4 for GLAM: Global-Local Variation Awareness in Mamba-based World Model

Abstract:Mimicking the real interaction trajectory in the inference of the world model has been shown to improve the sample efficiency of model-based reinforcement learning (MBRL) algorithms. Many methods directly use known state sequences for reasoning. However, this approach fails to enhance the quality of reasoning by capturing the subtle variation between states. Much like how humans infer trends in event development from this variation, in this work, we introduce Global-Local variation Awareness Mamba-based world model (GLAM) that improves reasoning quality by perceiving and predicting variation between states. GLAM comprises two Mambabased parallel reasoning modules, GMamba and LMamba, which focus on perceiving variation from global and local perspectives, respectively, during the reasoning process. GMamba focuses on identifying patterns of variation between states in the input sequence and leverages these patterns to enhance the prediction of future state variation. LMamba emphasizes reasoning about unknown information, such as rewards, termination signals, and visual representations, by perceiving variation in adjacent states. By integrating the strengths of the two modules, GLAM accounts for highervalue variation in environmental changes, providing the agent with more efficient imagination-based training. We demonstrate that our method outperforms existing methods in normalized human scores on the Atari 100k benchmark.

Via

Access Paper or Ask Questions

How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?

Oct 23, 2024

Jiahua Dong, Wenqi Liang, Hongliu Li, Duzhen Zhang, Meng Cao, Henghui Ding, Salman Khan, Fahad Shahbaz Khan

Figure 1 for How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?

Figure 2 for How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?

Figure 3 for How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?

Figure 4 for How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?

Abstract:Custom diffusion models (CDMs) have attracted widespread attention due to their astonishing generative ability for personalized concepts. However, most existing CDMs unreasonably assume that personalized concepts are fixed and cannot change over time. Moreover, they heavily suffer from catastrophic forgetting and concept neglect on old personalized concepts when continually learning a series of new concepts. To address these challenges, we propose a novel Concept-Incremental text-to-image Diffusion Model (CIDM), which can resolve catastrophic forgetting and concept neglect to learn new customization tasks in a concept-incremental manner. Specifically, to surmount the catastrophic forgetting of old concepts, we develop a concept consolidation loss and an elastic weight aggregation module. They can explore task-specific and task-shared knowledge during training, and aggregate all low-rank weights of old concepts based on their contributions during inference. Moreover, in order to address concept neglect, we devise a context-controllable synthesis strategy that leverages expressive region features and noise estimation to control the contexts of generated images according to user conditions. Experiments validate that our CIDM surpasses existing custom diffusion models. The source codes are available at https://github.com/JiahuaDong/CIFC.

* Accepted to NeurIPS2024

Via

Access Paper or Ask Questions

MuseumMaker: Continual Style Customization without Catastrophic Forgetting

Apr 29, 2024

Chenxi Liu, Gan Sun, Wenqi Liang, Jiahua Dong, Can Qin, Yang Cong

Figure 1 for MuseumMaker: Continual Style Customization without Catastrophic Forgetting

Figure 2 for MuseumMaker: Continual Style Customization without Catastrophic Forgetting

Figure 3 for MuseumMaker: Continual Style Customization without Catastrophic Forgetting

Figure 4 for MuseumMaker: Continual Style Customization without Catastrophic Forgetting

Abstract:Pre-trained large text-to-image (T2I) models with an appropriate text prompt has attracted growing interests in customized images generation field. However, catastrophic forgetting issue make it hard to continually synthesize new user-provided styles while retaining the satisfying results amongst learned styles. In this paper, we propose MuseumMaker, a method that enables the synthesis of images by following a set of customized styles in a never-end manner, and gradually accumulate these creative artistic works as a Museum. When facing with a new customization style, we develop a style distillation loss module to extract and learn the styles of the training data for new image generation. It can minimize the learning biases caused by content of new training images, and address the catastrophic overfitting issue induced by few-shot images. To deal with catastrophic forgetting amongst past learned styles, we devise a dual regularization for shared-LoRA module to optimize the direction of model update, which could regularize the diffusion model from both weight and feature aspects, respectively. Meanwhile, to further preserve historical knowledge from past styles and address the limited representability of LoRA, we consider a task-wise token learning module where a unique token embedding is learned to denote a new style. As any new user-provided style come, our MuseumMaker can capture the nuances of the new styles while maintaining the details of learned styles. Experimental results on diverse style datasets validate the effectiveness of our proposed MuseumMaker method, showcasing its robustness and versatility across various scenarios.

Via

Access Paper or Ask Questions

Never-Ending Embodied Robot Learning

Mar 01, 2024

Wenqi Liang, Gan Sun, Qian He, Yu Ren, Jiahua Dong, Yang Cong

Figure 1 for Never-Ending Embodied Robot Learning

Figure 2 for Never-Ending Embodied Robot Learning

Figure 3 for Never-Ending Embodied Robot Learning

Figure 4 for Never-Ending Embodied Robot Learning

Abstract:Relying on large language models (LLMs), embodied robots could perform complex multimodal robot manipulation tasks from visual observations with powerful generalization ability. However, most visual behavior-cloning agents suffer from manipulation performance degradation and skill knowledge forgetting when adapting into a series of challenging unseen tasks. We here investigate the above challenge with NBCagent in embodied robots, a pioneering language-conditioned Never-ending Behavior-Cloning agent, which can continually learn observation knowledge of novel robot manipulation skills from skill-specific and skill-shared attributes. Specifically, we establish a skill-specific evolving planner to perform knowledge decoupling, which can continually embed novel skill-specific knowledge in our NBCagent agent from latent and low-rank space. Meanwhile, we propose a skill-shared semantics rendering module and a skill-shared representation distillation module to effectively transfer anti-forgetting skill-shared knowledge, further tackling catastrophic forgetting on old skills from semantics and representation aspects. Finally, we design a continual embodied robot manipulation benchmark, and several expensive experiments demonstrate the significant performance of our method. Visual results, code, and dataset are provided at: https://neragent.github.io.

* 14 pages, 5 figures, 8 tables

Via

Access Paper or Ask Questions

Create Your World: Lifelong Text-to-Image Diffusion

Sep 08, 2023

Gan Sun, Wenqi Liang, Jiahua Dong, Jun Li, Zhengming Ding, Yang Cong

Abstract:Text-to-image generative models can produce diverse high-quality images of concepts with a text prompt, which have demonstrated excellent ability in image generation, image translation, etc. We in this work study the problem of synthesizing instantiations of a use's own concepts in a never-ending manner, i.e., create your world, where the new concepts from user are quickly learned with a few examples. To achieve this goal, we propose a Lifelong text-to-image Diffusion Model (L2DM), which intends to overcome knowledge "catastrophic forgetting" for the past encountered concepts, and semantic "catastrophic neglecting" for one or more concepts in the text prompt. In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module, which could respectively safeguard the knowledge of both prior concepts and each past personalized concept. When generating images with a user text prompt, the solution to semantic "catastrophic neglecting" is that a concept attention artist module can alleviate the semantic neglecting from concept aspect, and an orthogonal attention module can reduce the semantic binding from attribute aspect. To the end, our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics, when comparing with the related state-of-the-art models. The code will be released at https://wenqiliang.github.io/.

* 15 pages,10 figures

Via

Access Paper or Ask Questions

I3DOD: Towards Incremental 3D Object Detection via Prompting

Aug 24, 2023

Wenqi Liang, Gan Sun, Chenxi Liu, Jiahua Dong, Kangru Wang

Figure 1 for I3DOD: Towards Incremental 3D Object Detection via Prompting

Figure 2 for I3DOD: Towards Incremental 3D Object Detection via Prompting

Figure 3 for I3DOD: Towards Incremental 3D Object Detection via Prompting

Figure 4 for I3DOD: Towards Incremental 3D Object Detection via Prompting

Abstract:3D object detection has achieved significant performance in many fields, e.g., robotics system, autonomous driving, and augmented reality. However, most existing methods could cause catastrophic forgetting of old classes when performing on the class-incremental scenarios. Meanwhile, the current class-incremental 3D object detection methods neglect the relationships between the object localization information and category semantic information and assume all the knowledge of old model is reliable. To address the above challenge, we present a novel Incremental 3D Object Detection framework with the guidance of prompting, i.e., I3DOD. Specifically, we propose a task-shared prompts mechanism to learn the matching relationships between the object localization information and category semantic information. After training on the current task, these prompts will be stored in our prompt pool, and perform the relationship of old classes in the next task. Moreover, we design a reliable distillation strategy to transfer knowledge from two aspects: a reliable dynamic distillation is developed to filter out the negative knowledge and transfer the reliable 3D knowledge to new detection model; the relation feature is proposed to capture the responses relation in feature space and protect plasticity of the model when learning novel 3D classes. To the end, we conduct comprehensive experiments on two benchmark datasets and our method outperforms the state-of-the-art object detection methods by 0.6% - 2.7% in terms of mAP@0.25.

* 6 pages, 5 figures

Via

Access Paper or Ask Questions

Heterogeneous Forgetting Compensation for Class-Incremental Learning

Aug 24, 2023

Jiahua Dong, Wenqi Liang, Yang Cong, Gan Sun

Figure 1 for Heterogeneous Forgetting Compensation for Class-Incremental Learning

Figure 2 for Heterogeneous Forgetting Compensation for Class-Incremental Learning

Figure 3 for Heterogeneous Forgetting Compensation for Class-Incremental Learning

Figure 4 for Heterogeneous Forgetting Compensation for Class-Incremental Learning

Abstract:Class-incremental learning (CIL) has achieved remarkable successes in learning new classes consecutively while overcoming catastrophic forgetting on old categories. However, most existing CIL methods unreasonably assume that all old categories have the same forgetting pace, and neglect negative influence of forgetting heterogeneity among different old classes on forgetting compensation. To surmount the above challenges, we develop a novel Heterogeneous Forgetting Compensation (HFC) model, which can resolve heterogeneous forgetting of easy-to-forget and hard-to-forget old categories from both representation and gradient aspects. Specifically, we design a task-semantic aggregation block to alleviate heterogeneous forgetting from representation aspect. It aggregates local category information within each task to learn task-shared global representations. Moreover, we develop two novel plug-and-play losses: a gradient-balanced forgetting compensation loss and a gradient-balanced relation distillation loss to alleviate forgetting from gradient aspect. They consider gradient-balanced compensation to rectify forgetting heterogeneity of old categories and heterogeneous relation consistency. Experiments on several representative datasets illustrate effectiveness of our HFC model. The code is available at https://github.com/JiahuaDong/HFC.

* Accepted to ICCV2023

Via

Access Paper or Ask Questions

Understanding Weight Similarity of Neural Networks via Chain Normalization Rule and Hypothesis-Training-Testing

Aug 08, 2022

Guangcong Wang, Guangrun Wang, Wenqi Liang, Jianhuang Lai

Figure 1 for Understanding Weight Similarity of Neural Networks via Chain Normalization Rule and Hypothesis-Training-Testing

Figure 2 for Understanding Weight Similarity of Neural Networks via Chain Normalization Rule and Hypothesis-Training-Testing

Figure 3 for Understanding Weight Similarity of Neural Networks via Chain Normalization Rule and Hypothesis-Training-Testing

Abstract:We present a weight similarity measure method that can quantify the weight similarity of non-convex neural networks. To understand the weight similarity of different trained models, we propose to extract the feature representation from the weights of neural networks. We first normalize the weights of neural networks by introducing a chain normalization rule, which is used for weight representation learning and weight similarity measure. We extend the traditional hypothesis-testing method to a hypothesis-training-testing statistical inference method to validate the hypothesis on the weight similarity of neural networks. With the chain normalization rule and the new statistical inference, we study the weight similarity measure on Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN), and find that the weights of an identical neural network optimized with the Stochastic Gradient Descent (SGD) algorithm converge to a similar local solution in a metric space. The weight similarity measure provides more insight into the local solutions of neural networks. Experiments on several datasets consistently validate the hypothesis of weight similarity measure.

* Weight Similarity of Neural Networks

Via

Access Paper or Ask Questions

Learnable Parameter Similarity

Jul 27, 2019

Guangcong Wang, Jianhuang Lai, Wenqi Liang, Guangrun Wang

Figure 1 for Learnable Parameter Similarity

Figure 2 for Learnable Parameter Similarity

Figure 3 for Learnable Parameter Similarity

Figure 4 for Learnable Parameter Similarity

Abstract:Most of the existing approaches focus on specific visual tasks while ignoring the relations between them. Estimating task relation sheds light on the learning of high-order semantic concepts, e.g., transfer learning. How to reveal the underlying relations between different visual tasks remains largely unexplored. In this paper, we propose a novel \textbf{L}earnable \textbf{P}arameter \textbf{S}imilarity (\textbf{LPS}) method that learns an effective metric to measure the similarity of second-order semantics hidden in trained models. LPS is achieved by using a second-order neural network to align high-dimensional model parameters and learning second-order similarity in an end-to-end way. In addition, we create a model set called ModelSet500 as a parameter similarity learning benchmark that contains 500 trained models. Extensive experiments on ModelSet500 validate the effectiveness of the proposed method. Code will be released at \url{https://github.com/Wanggcong/learnable-parameter-similarity}.

* 9 pages

Via

Access Paper or Ask Questions

M2M-GAN: Many-to-Many Generative Adversarial Transfer Learning for Person Re-Identification

Nov 09, 2018

Wenqi Liang, Guangcong Wang, Jianhuang Lai, Junyong Zhu

Figure 1 for M2M-GAN: Many-to-Many Generative Adversarial Transfer Learning for Person Re-Identification

Figure 2 for M2M-GAN: Many-to-Many Generative Adversarial Transfer Learning for Person Re-Identification

Figure 3 for M2M-GAN: Many-to-Many Generative Adversarial Transfer Learning for Person Re-Identification

Figure 4 for M2M-GAN: Many-to-Many Generative Adversarial Transfer Learning for Person Re-Identification

Abstract:Cross-domain transfer learning (CDTL) is an extremely challenging task for the person re-identification (ReID). Given a source domain with annotations and a target domain without annotations, CDTL seeks an effective method to transfer the knowledge from the source domain to the target domain. However, such a simple two-domain transfer learning method is unavailable for the person ReID in that the source/target domain consists of several sub-domains, e.g., camera-based sub-domains. To address this intractable problem, we propose a novel Many-to-Many Generative Adversarial Transfer Learning method (M2M-GAN) that takes multiple source sub-domains and multiple target sub-domains into consideration and performs each sub-domain transferring mapping from the source domain to the target domain in a unified optimization process. The proposed method first translates the image styles of source sub-domains into that of target sub-domains, and then performs the supervised learning by using the transferred images and the corresponding annotations in source domain. As the gap is reduced, M2M-GAN achieves a promising result for the cross-domain person ReID. Experimental results on three benchmark datasets Market-1501, DukeMTMC-reID and MSMT17 show the effectiveness of our M2M-GAN.

Via

Access Paper or Ask Questions