Picture for Zhaokai Wang

Zhaokai Wang

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Add code
Jan 14, 2025
Viaarxiv icon

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Add code
Dec 12, 2024
Viaarxiv icon

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Add code
Dec 12, 2024
Viaarxiv icon

Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning

Add code
Oct 21, 2024
Viaarxiv icon

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Add code
Oct 10, 2024
Figure 1 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 2 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 3 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 4 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Viaarxiv icon

Parameter-Inverted Image Pyramid Networks

Add code
Jun 06, 2024
Viaarxiv icon

Synergizing Spatial Optimization with Large Language Models for Open-Domain Urban Itinerary Planning

Add code
Feb 11, 2024
Viaarxiv icon

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

Add code
Dec 14, 2023
Figure 1 for Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Figure 2 for Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Figure 3 for Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Figure 4 for Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Viaarxiv icon

Video Background Music Generation: Dataset, Method and Evaluation

Add code
Nov 21, 2022
Viaarxiv icon

Video Background Music Generation with Controllable Music Transformer

Add code
Nov 16, 2021
Figure 1 for Video Background Music Generation with Controllable Music Transformer
Figure 2 for Video Background Music Generation with Controllable Music Transformer
Figure 3 for Video Background Music Generation with Controllable Music Transformer
Figure 4 for Video Background Music Generation with Controllable Music Transformer
Viaarxiv icon