Picture for Chengjie Wang

Chengjie Wang

EMOv2: Pushing 5M Vision Model Frontier

Add code
Dec 09, 2024
Viaarxiv icon

DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation

Add code
Dec 04, 2024
Figure 1 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Figure 2 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Figure 3 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Figure 4 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Viaarxiv icon

HiFiVFS: High Fidelity Video Face Swapping

Add code
Nov 27, 2024
Figure 1 for HiFiVFS: High Fidelity Video Face Swapping
Figure 2 for HiFiVFS: High Fidelity Video Face Swapping
Figure 3 for HiFiVFS: High Fidelity Video Face Swapping
Figure 4 for HiFiVFS: High Fidelity Video Face Swapping
Viaarxiv icon

Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing

Add code
Nov 26, 2024
Figure 1 for Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Figure 2 for Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Figure 3 for Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Figure 4 for Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Viaarxiv icon

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation

Add code
Nov 25, 2024
Figure 1 for Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Figure 2 for Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Figure 3 for Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Figure 4 for Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Viaarxiv icon

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

Add code
Nov 24, 2024
Figure 1 for MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Figure 2 for MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Figure 3 for MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Figure 4 for MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Viaarxiv icon

FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

Add code
Nov 22, 2024
Viaarxiv icon

Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation

Add code
Nov 06, 2024
Figure 1 for Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation
Figure 2 for Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation
Figure 3 for Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation
Figure 4 for Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation
Viaarxiv icon

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

Add code
Oct 21, 2024
Viaarxiv icon

MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

Add code
Oct 12, 2024
Viaarxiv icon