Picture for Jingdong Wang

Jingdong Wang

Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models

Add code
Jan 03, 2025
Figure 1 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 2 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 3 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 4 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Viaarxiv icon

Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities

Add code
Dec 21, 2024
Viaarxiv icon

Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception

Add code
Dec 18, 2024
Viaarxiv icon

Unbiased General Annotated Dataset Generation

Add code
Dec 14, 2024
Figure 1 for Unbiased General Annotated Dataset Generation
Figure 2 for Unbiased General Annotated Dataset Generation
Figure 3 for Unbiased General Annotated Dataset Generation
Figure 4 for Unbiased General Annotated Dataset Generation
Viaarxiv icon

Visual Object Tracking across Diverse Data Modalities: A Review

Add code
Dec 13, 2024
Figure 1 for Visual Object Tracking across Diverse Data Modalities: A Review
Figure 2 for Visual Object Tracking across Diverse Data Modalities: A Review
Figure 3 for Visual Object Tracking across Diverse Data Modalities: A Review
Figure 4 for Visual Object Tracking across Diverse Data Modalities: A Review
Viaarxiv icon

ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts

Add code
Dec 11, 2024
Figure 1 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Figure 2 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Figure 3 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Figure 4 for ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
Viaarxiv icon

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Add code
Dec 03, 2024
Viaarxiv icon

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks

Add code
Dec 01, 2024
Viaarxiv icon

Continual SFT Matches Multimodal RLHF with Negative Supervision

Add code
Nov 22, 2024
Figure 1 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 2 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 3 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 4 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Viaarxiv icon

TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior

Add code
Nov 22, 2024
Figure 1 for TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior
Figure 2 for TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior
Figure 3 for TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior
Figure 4 for TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior
Viaarxiv icon