Picture for Qiushan Guo

Qiushan Guo

Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration

Add code
Mar 02, 2026
Viaarxiv icon

VTok: A Unified Video Tokenizer with Decoupled Spatial-Temporal Latents

Add code
Feb 04, 2026
Viaarxiv icon

Composable Visual Tokenizers with Generator-Free Diagnostics of Learnability

Add code
Feb 03, 2026
Viaarxiv icon

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Add code
Dec 23, 2025
Viaarxiv icon

MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

Add code
May 25, 2025
Viaarxiv icon

DanceGRPO: Unleashing GRPO on Visual Generation

Add code
May 12, 2025
Figure 1 for DanceGRPO: Unleashing GRPO on Visual Generation
Figure 2 for DanceGRPO: Unleashing GRPO on Visual Generation
Figure 3 for DanceGRPO: Unleashing GRPO on Visual Generation
Figure 4 for DanceGRPO: Unleashing GRPO on Visual Generation
Viaarxiv icon

Seedream 3.0 Technical Report

Add code
Apr 16, 2025
Figure 1 for Seedream 3.0 Technical Report
Figure 2 for Seedream 3.0 Technical Report
Figure 3 for Seedream 3.0 Technical Report
Figure 4 for Seedream 3.0 Technical Report
Viaarxiv icon

SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model

Add code
Jun 03, 2024
Figure 1 for SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
Figure 2 for SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
Figure 3 for SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
Figure 4 for SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
Viaarxiv icon

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

Add code
May 13, 2024
Figure 1 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Figure 2 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Figure 3 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Figure 4 for Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Viaarxiv icon

RegionGPT: Towards Region Understanding Vision Language Model

Add code
Mar 04, 2024
Figure 1 for RegionGPT: Towards Region Understanding Vision Language Model
Figure 2 for RegionGPT: Towards Region Understanding Vision Language Model
Figure 3 for RegionGPT: Towards Region Understanding Vision Language Model
Figure 4 for RegionGPT: Towards Region Understanding Vision Language Model
Viaarxiv icon