Picture for Cong Wei

Cong Wei

Visual-Aware CoT: Achieving High-Fidelity Visual Consistency in Unified Models

Add code
Dec 22, 2025
Viaarxiv icon

UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models

Add code
Dec 12, 2025
Figure 1 for UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Figure 2 for UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Figure 3 for UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Figure 4 for UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Viaarxiv icon

MoCha: Towards Movie-Grade Talking Character Synthesis

Add code
Mar 30, 2025
Figure 1 for MoCha: Towards Movie-Grade Talking Character Synthesis
Figure 2 for MoCha: Towards Movie-Grade Talking Character Synthesis
Figure 3 for MoCha: Towards Movie-Grade Talking Character Synthesis
Figure 4 for MoCha: Towards Movie-Grade Talking Character Synthesis
Viaarxiv icon

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Add code
Mar 14, 2025
Figure 1 for Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Figure 2 for Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Figure 3 for Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Figure 4 for Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Viaarxiv icon

A Survey on Data-Centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective

Add code
Feb 12, 2025
Figure 1 for A Survey on Data-Centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective
Figure 2 for A Survey on Data-Centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective
Figure 3 for A Survey on Data-Centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective
Viaarxiv icon

InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models

Add code
Dec 18, 2024
Viaarxiv icon

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation

Add code
Dec 01, 2024
Figure 1 for VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Figure 2 for VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Figure 3 for VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Figure 4 for VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Viaarxiv icon

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

Add code
Nov 26, 2024
Figure 1 for HyperSeg: Towards Universal Visual Segmentation with Large Language Model
Figure 2 for HyperSeg: Towards Universal Visual Segmentation with Large Language Model
Figure 3 for HyperSeg: Towards Universal Visual Segmentation with Large Language Model
Figure 4 for HyperSeg: Towards Universal Visual Segmentation with Large Language Model
Viaarxiv icon

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Add code
Nov 11, 2024
Figure 1 for OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Figure 2 for OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Figure 3 for OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Figure 4 for OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Viaarxiv icon

MANTIS: Interleaved Multi-Image Instruction Tuning

Add code
May 02, 2024
Figure 1 for MANTIS: Interleaved Multi-Image Instruction Tuning
Figure 2 for MANTIS: Interleaved Multi-Image Instruction Tuning
Figure 3 for MANTIS: Interleaved Multi-Image Instruction Tuning
Figure 4 for MANTIS: Interleaved Multi-Image Instruction Tuning
Viaarxiv icon