Picture for Jiahao Wang

Jiahao Wang

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation

Add code
Apr 14, 2025
Viaarxiv icon

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

Add code
Mar 31, 2025
Viaarxiv icon

Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark

Add code
Mar 10, 2025
Viaarxiv icon

DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability

Add code
Mar 09, 2025
Viaarxiv icon

FuzzyLight: A Robust Two-Stage Fuzzy Approach for Traffic Signal Control Works in Real Cities

Add code
Jan 27, 2025
Viaarxiv icon

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Add code
Dec 12, 2024
Figure 1 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 2 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 3 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 4 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Viaarxiv icon

GenEx: Generating an Explorable World

Add code
Dec 12, 2024
Viaarxiv icon

VP-MEL: Visual Prompts Guided Multimodal Entity Linking

Add code
Dec 10, 2024
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Figure 1 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 2 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 3 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 4 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Viaarxiv icon