Picture for Zhaokai Wang

Zhaokai Wang

GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing

Add code
Mar 12, 2026
Viaarxiv icon

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Add code
Mar 10, 2026
Viaarxiv icon

GenExam: A Multidisciplinary Text-to-Image Exam

Add code
Sep 17, 2025
Figure 1 for GenExam: A Multidisciplinary Text-to-Image Exam
Figure 2 for GenExam: A Multidisciplinary Text-to-Image Exam
Figure 3 for GenExam: A Multidisciplinary Text-to-Image Exam
Figure 4 for GenExam: A Multidisciplinary Text-to-Image Exam
Viaarxiv icon

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Figure 1 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 2 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 3 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 4 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Viaarxiv icon

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

Add code
Aug 06, 2025
Figure 1 for OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Figure 2 for OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Figure 3 for OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Figure 4 for OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Viaarxiv icon

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Add code
Apr 18, 2025
Viaarxiv icon

Vision-to-Music Generation: A Survey

Add code
Mar 27, 2025
Viaarxiv icon

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation

Add code
Mar 10, 2025
Viaarxiv icon

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Add code
Jan 14, 2025
Viaarxiv icon

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Add code
Dec 12, 2024
Figure 1 for SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Figure 2 for SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Figure 3 for SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Figure 4 for SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Viaarxiv icon