Picture for Song Bai

Song Bai

Alibaba Group, University of Oxford

MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes

Add code
Aug 07, 2025
Viaarxiv icon

PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild

Add code
Apr 15, 2025
Viaarxiv icon

Liquid: Language Models are Scalable Multi-modal Generators

Add code
Dec 05, 2024
Figure 1 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 2 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 3 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 4 for Liquid: Language Models are Scalable Multi-modal Generators
Viaarxiv icon

PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

Add code
Jul 23, 2024
Viaarxiv icon

PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

Add code
Jun 24, 2024
Figure 1 for PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Figure 2 for PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Figure 3 for PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Figure 4 for PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Viaarxiv icon

DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data

Add code
Jun 07, 2024
Figure 1 for DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data
Figure 2 for DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data
Figure 3 for DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data
Figure 4 for DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data
Viaarxiv icon

Debiasing Text-to-Image Diffusion Models

Add code
Feb 22, 2024
Viaarxiv icon

Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human

Add code
Jan 05, 2024
Viaarxiv icon

General Object Foundation Model for Images and Videos at Scale

Add code
Dec 14, 2023
Figure 1 for General Object Foundation Model for Images and Videos at Scale
Figure 2 for General Object Foundation Model for Images and Videos at Scale
Figure 3 for General Object Foundation Model for Images and Videos at Scale
Figure 4 for General Object Foundation Model for Images and Videos at Scale
Viaarxiv icon

Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery

Add code
Dec 05, 2023
Figure 1 for Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery
Figure 2 for Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery
Figure 3 for Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery
Figure 4 for Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery
Viaarxiv icon