Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rang Meng

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Nov 15, 2024

Rang Meng, Xingyu Zhang, Yuming Li, Chenguang Ma

Figure 1 for EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Figure 2 for EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Figure 3 for EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Figure 4 for EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Abstract:Recent work on human animation usually involves audio, pose, or movement maps conditions, thereby achieves vivid animation quality. However, these methods often face practical challenges due to extra control conditions, cumbersome condition injection modules, or limitation to head region driving. Hence, we ask if it is possible to achieve striking half-body human animation while simplifying unnecessary conditions. To this end, we propose a half-body human animation method, dubbed EchoMimicV2, that leverages a novel Audio-Pose Dynamic Harmonization strategy, including Pose Sampling and Audio Diffusion, to enhance half-body details, facial and gestural expressiveness, and meanwhile reduce conditions redundancy. To compensate for the scarcity of half-body data, we utilize Head Partial Attention to seamlessly accommodate headshot data into our training framework, which can be omitted during inference, providing a free lunch for animation. Furthermore, we design the Phase-specific Denoising Loss to guide motion, detail, and low-level quality for animation in specific phases, respectively. Besides, we also present a novel benchmark for evaluating the effectiveness of half-body human animation. Extensive experiments and analyses demonstrate that EchoMimicV2 surpasses existing methods in both quantitative and qualitative evaluations.

Via

Access Paper or Ask Questions

Attention Diversification for Domain Generalization

Oct 09, 2022

Rang Meng, Xianfeng Li, Weijie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Mingli Song, Di Xie, Shiliang Pu

Figure 1 for Attention Diversification for Domain Generalization

Figure 2 for Attention Diversification for Domain Generalization

Figure 3 for Attention Diversification for Domain Generalization

Figure 4 for Attention Diversification for Domain Generalization

Abstract:Convolutional neural networks (CNNs) have demonstrated gratifying results at learning discriminative features. However, when applied to unseen domains, state-of-the-art models are usually prone to errors due to domain shift. After investigating this issue from the perspective of shortcut learning, we find the devils lie in the fact that models trained on different domains merely bias to different domain-specific features yet overlook diverse task-related features. Under this guidance, a novel Attention Diversification framework is proposed, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated to reassign appropriate attention to diverse task-related features. Briefly, Intra-Model Attention Diversification Regularization is equipped on the high-level feature maps to achieve in-channel discrimination and cross-channel diversification via forcing different channels to pay their most salient attention to different spatial locations. Besides, Inter-Model Attention Diversification Regularization is proposed to further provide task-related attention diversification and domain-related attention suppression, which is a paradigm of "simulate, divide and assemble": simulate domain shift via exploiting multiple domain-specific models, divide attention maps into task-related and domain-related groups, and assemble them within each group respectively to execute regularization. Extensive experiments and analyses are conducted on various benchmarks to demonstrate that our method achieves state-of-the-art performance over other competing methods. Code is available at https://github.com/hikvision-research/DomainGeneralization.

* European Conference on Computer Vision (ECCV 2022)
* ECCV 2022. Code available at https://github.com/hikvision-research/DomainGeneralization

Via

Access Paper or Ask Questions

Slimmable Domain Adaptation

Jun 14, 2022

Rang Meng, Weijie Chen, Shicai Yang, Jie Song, Luojun Lin, Di Xie, Shiliang Pu, Xinchao Wang, Mingli Song, Yueting Zhuang

Figure 1 for Slimmable Domain Adaptation

Figure 2 for Slimmable Domain Adaptation

Figure 3 for Slimmable Domain Adaptation

Figure 4 for Slimmable Domain Adaptation

Abstract:Vanilla unsupervised domain adaptation methods tend to optimize the model with fixed neural architecture, which is not very practical in real-world scenarios since the target data is usually processed by different resource-limited devices. It is therefore of great necessity to facilitate architecture adaptation across various devices. In this paper, we introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank, from which models of different capacities can be sampled to accommodate different accuracy-efficiency trade-offs. The main challenge in this framework lies in simultaneously boosting the adaptation performance of numerous models in the model bank. To tackle this problem, we develop a Stochastic EnsEmble Distillation method to fully exploit the complementary knowledge in the model bank for inter-model interaction. Nevertheless, considering the optimization conflict between inter-model interaction and intra-model adaptation, we augment the existing bi-classifier domain confusion architecture into an Optimization-Separated Tri-Classifier counterpart. After optimizing the model bank, architecture adaptation is leveraged via our proposed Unsupervised Performance Evaluation Metric. Under various resource constraints, our framework surpasses other competing approaches by a very large margin on multiple benchmarks. It is also worth emphasizing that our framework can preserve the performance improvement against the source-only model even when the computing complexity is reduced to $1/64$. Code will be available at https://github.com/hikvision-research/SlimDA.

* IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2022
* To appear in CVPR 2022. Code is coming soon: https://github.com/hikvision-research/SlimDA

Via

Access Paper or Ask Questions

Neural Inheritance Relation Guided One-Shot Layer Assignment Search

Feb 28, 2020

Rang Meng, Weijie Chen, Di Xie, Yuan Zhang, Shiliang Pu

Figure 1 for Neural Inheritance Relation Guided One-Shot Layer Assignment Search

Figure 2 for Neural Inheritance Relation Guided One-Shot Layer Assignment Search

Figure 3 for Neural Inheritance Relation Guided One-Shot Layer Assignment Search

Figure 4 for Neural Inheritance Relation Guided One-Shot Layer Assignment Search

Abstract:Layer assignment is seldom picked out as an independent research topic in neural architecture search. In this paper, for the first time, we systematically investigate the impact of different layer assignments to the network performance by building an architecture dataset of layer assignment on CIFAR-100. Through analyzing this dataset, we discover a neural inheritance relation among the networks with different layer assignments, that is, the optimal layer assignments for deeper networks always inherit from those for shallow networks. Inspired by this neural inheritance relation, we propose an efficient one-shot layer assignment search approach via inherited sampling. Specifically, the optimal layer assignment searched in the shallow network can be provided as a strong sampling priori to train and search the deeper ones in supernet, which extremely reduces the network search space. Comprehensive experiments carried out on CIFAR-100 illustrate the efficiency of our proposed method. Our search results are strongly consistent with the optimal ones directly selected from the architecture dataset. To further confirm the generalization of our proposed method, we also conduct experiments on Tiny-ImageNet and ImageNet. Our searched results are remarkably superior to the handcrafted ones under the unchanged computational budgets. The neural inheritance relation discovered in this paper can provide insights to the universal neural architecture search.

* AAAI2020

Via

Access Paper or Ask Questions

PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

Oct 03, 2018

Andrey Ignatov, Radu Timofte, Thang Van Vu, Tung Minh Luu, Trung X Pham, Cao Van Nguyen, Yongwoo Kim, Jae-Seok Choi, Munchurl Kim, Jie Huang(+38 more)

Figure 1 for PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

Figure 2 for PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

Figure 3 for PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

Figure 4 for PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

Abstract:This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones. The challenge consisted of two tracks. In the first one, participants were solving the classical image super-resolution problem with a bicubic downscaling factor of 4. The second track was aimed at real-world photo enhancement, and the goal was to map low-quality photos from the iPhone 3GS device to the same photos captured with a DSLR camera. The target metric used in this challenge combined the runtime, PSNR scores and solutions' perceptual results measured in the user study. To ensure the efficiency of the submitted models, we additionally measured their runtime and memory requirements on Android smartphones. The proposed solutions significantly improved baseline results defining the state-of-the-art for image enhancement on smartphones.

Via

Access Paper or Ask Questions