Picture for Jingdong Wang

Jingdong Wang

Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers

Add code
Mar 13, 2025
Viaarxiv icon

MagicGeo: Training-Free Text-Guided Geometric Diagram Generation

Add code
Feb 19, 2025
Viaarxiv icon

Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models

Add code
Jan 03, 2025
Figure 1 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 2 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 3 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 4 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Viaarxiv icon

Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities

Add code
Dec 21, 2024
Figure 1 for Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities
Figure 2 for Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities
Figure 3 for Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities
Figure 4 for Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities
Viaarxiv icon

Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception

Add code
Dec 18, 2024
Figure 1 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Figure 2 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Figure 3 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Figure 4 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Viaarxiv icon

Unbiased General Annotated Dataset Generation

Add code
Dec 14, 2024
Figure 1 for Unbiased General Annotated Dataset Generation
Figure 2 for Unbiased General Annotated Dataset Generation
Figure 3 for Unbiased General Annotated Dataset Generation
Figure 4 for Unbiased General Annotated Dataset Generation
Viaarxiv icon

Visual Object Tracking across Diverse Data Modalities: A Review

Add code
Dec 13, 2024
Figure 1 for Visual Object Tracking across Diverse Data Modalities: A Review
Figure 2 for Visual Object Tracking across Diverse Data Modalities: A Review
Figure 3 for Visual Object Tracking across Diverse Data Modalities: A Review
Figure 4 for Visual Object Tracking across Diverse Data Modalities: A Review
Viaarxiv icon

ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts

Add code
Dec 11, 2024
Figure 1 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Figure 2 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Figure 3 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Figure 4 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Viaarxiv icon

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Add code
Dec 03, 2024
Figure 1 for OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
Figure 2 for OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
Figure 3 for OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
Figure 4 for OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
Viaarxiv icon

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks

Add code
Dec 01, 2024
Viaarxiv icon