Picture for Jingdong Wang

Jingdong Wang

MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts

Add code
Oct 30, 2024
Viaarxiv icon

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Add code
Oct 24, 2024
Viaarxiv icon

Improving Multi-modal Large Language Model through Boosting Vision Capabilities

Add code
Oct 17, 2024
Figure 1 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 2 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 3 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 4 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Viaarxiv icon

TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model

Add code
Oct 14, 2024
Viaarxiv icon

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

Add code
Oct 10, 2024
Figure 1 for Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Figure 2 for Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Figure 3 for Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Figure 4 for Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Viaarxiv icon

MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction

Add code
Oct 10, 2024
Viaarxiv icon

Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

Add code
Sep 29, 2024
Figure 1 for Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery
Figure 2 for Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery
Figure 3 for Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery
Figure 4 for Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery
Viaarxiv icon

MonoFormer: One Transformer for Both Diffusion and Autoregression

Add code
Sep 24, 2024
Figure 1 for MonoFormer: One Transformer for Both Diffusion and Autoregression
Figure 2 for MonoFormer: One Transformer for Both Diffusion and Autoregression
Figure 3 for MonoFormer: One Transformer for Both Diffusion and Autoregression
Figure 4 for MonoFormer: One Transformer for Both Diffusion and Autoregression
Viaarxiv icon

SpotActor: Training-Free Layout-Controlled Consistent Image Generation

Add code
Sep 07, 2024
Viaarxiv icon

Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression

Add code
Sep 01, 2024
Figure 1 for Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Figure 2 for Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Figure 3 for Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Figure 4 for Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Viaarxiv icon