Picture for Jingdong Wang

Jingdong Wang

Visual Object Tracking across Diverse Data Modalities: A Review

Add code
Dec 13, 2024
Viaarxiv icon

ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts

Add code
Dec 11, 2024
Viaarxiv icon

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Add code
Dec 03, 2024
Viaarxiv icon

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks

Add code
Dec 01, 2024
Viaarxiv icon

TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior

Add code
Nov 22, 2024
Viaarxiv icon

Continual SFT Matches Multimodal RLHF with Negative Supervision

Add code
Nov 22, 2024
Viaarxiv icon

DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes

Add code
Nov 20, 2024
Figure 1 for DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes
Figure 2 for DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes
Figure 3 for DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes
Figure 4 for DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes
Viaarxiv icon

MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts

Add code
Oct 30, 2024
Figure 1 for MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
Figure 2 for MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
Figure 3 for MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
Figure 4 for MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
Viaarxiv icon

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Add code
Oct 24, 2024
Viaarxiv icon

Improving Multi-modal Large Language Model through Boosting Vision Capabilities

Add code
Oct 17, 2024
Figure 1 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 2 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 3 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 4 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Viaarxiv icon