Picture for Xiaoyi Bao

Xiaoyi Bao

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Add code
Dec 15, 2025
Viaarxiv icon

Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention

Add code
Oct 02, 2025
Viaarxiv icon

GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning

Add code
Jun 12, 2025
Figure 1 for GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning
Figure 2 for GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning
Figure 3 for GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning
Figure 4 for GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning
Viaarxiv icon

Aligned Better, Listen Better for Audio-Visual Large Language Models

Add code
Apr 02, 2025
Figure 1 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 2 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 3 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 4 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Viaarxiv icon

UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface

Add code
Mar 04, 2025
Figure 1 for UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Figure 2 for UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Figure 3 for UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Figure 4 for UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Viaarxiv icon

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation

Add code
Nov 13, 2024
Figure 1 for EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
Figure 2 for EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
Figure 3 for EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
Figure 4 for EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
Viaarxiv icon

CoReS: Orchestrating the Dance of Reasoning and Segmentation

Add code
Apr 08, 2024
Figure 1 for CoReS: Orchestrating the Dance of Reasoning and Segmentation
Figure 2 for CoReS: Orchestrating the Dance of Reasoning and Segmentation
Figure 3 for CoReS: Orchestrating the Dance of Reasoning and Segmentation
Figure 4 for CoReS: Orchestrating the Dance of Reasoning and Segmentation
Viaarxiv icon

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

Add code
Mar 11, 2024
Figure 1 for DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
Figure 2 for DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
Figure 3 for DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
Figure 4 for DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
Viaarxiv icon

Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model

Add code
Dec 18, 2023
Figure 1 for Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model
Figure 2 for Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model
Figure 3 for Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model
Figure 4 for Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model
Viaarxiv icon

Relevant Intrinsic Feature Enhancement Network for Few-Shot Semantic Segmentation

Add code
Dec 11, 2023
Viaarxiv icon