Picture for Zhaokai Wang

Zhaokai Wang

Vision-to-Music Generation: A Survey

Add code
Mar 27, 2025
Viaarxiv icon

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation

Add code
Mar 10, 2025
Viaarxiv icon

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Add code
Jan 14, 2025
Viaarxiv icon

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Add code
Dec 12, 2024
Viaarxiv icon

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Add code
Dec 12, 2024
Figure 1 for SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Figure 2 for SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Figure 3 for SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Figure 4 for SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Viaarxiv icon

Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning

Add code
Oct 21, 2024
Viaarxiv icon

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Add code
Oct 10, 2024
Figure 1 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 2 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 3 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Figure 4 for Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Viaarxiv icon

Parameter-Inverted Image Pyramid Networks

Add code
Jun 06, 2024
Figure 1 for Parameter-Inverted Image Pyramid Networks
Figure 2 for Parameter-Inverted Image Pyramid Networks
Figure 3 for Parameter-Inverted Image Pyramid Networks
Figure 4 for Parameter-Inverted Image Pyramid Networks
Viaarxiv icon

Synergizing Spatial Optimization with Large Language Models for Open-Domain Urban Itinerary Planning

Add code
Feb 11, 2024
Viaarxiv icon

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

Add code
Dec 14, 2023
Figure 1 for Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Figure 2 for Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Figure 3 for Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Figure 4 for Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Viaarxiv icon