Picture for Jiangning Zhang

Jiangning Zhang

EMOv2: Pushing 5M Vision Model Frontier

Add code
Dec 09, 2024
Viaarxiv icon

Exploring Real&Synthetic Dataset and Linear Attention in Image Restoration

Add code
Dec 05, 2024
Viaarxiv icon

DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation

Add code
Dec 04, 2024
Figure 1 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Figure 2 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Figure 3 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Figure 4 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Viaarxiv icon

Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing

Add code
Nov 26, 2024
Figure 1 for Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Figure 2 for Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Figure 3 for Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Figure 4 for Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Viaarxiv icon

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation

Add code
Nov 25, 2024
Figure 1 for Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Figure 2 for Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Figure 3 for Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Figure 4 for Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Viaarxiv icon

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

Add code
Nov 24, 2024
Figure 1 for MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Figure 2 for MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Figure 3 for MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Figure 4 for MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Viaarxiv icon

FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

Add code
Nov 22, 2024
Viaarxiv icon

Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation

Add code
Nov 06, 2024
Figure 1 for Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation
Figure 2 for Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation
Figure 3 for Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation
Figure 4 for Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation
Viaarxiv icon

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

Add code
Oct 21, 2024
Viaarxiv icon

OSV: One Step is Enough for High-Quality Image to Video Generation

Add code
Sep 17, 2024
Figure 1 for OSV: One Step is Enough for High-Quality Image to Video Generation
Figure 2 for OSV: One Step is Enough for High-Quality Image to Video Generation
Figure 3 for OSV: One Step is Enough for High-Quality Image to Video Generation
Figure 4 for OSV: One Step is Enough for High-Quality Image to Video Generation
Viaarxiv icon