Picture for Xiawu Zheng

Xiawu Zheng

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

Add code
Jan 09, 2025
Viaarxiv icon

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Add code
Jan 03, 2025
Viaarxiv icon

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

Add code
Nov 20, 2024
Figure 1 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Figure 2 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Figure 3 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Figure 4 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Viaarxiv icon

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Add code
Aug 09, 2024
Figure 1 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 2 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 3 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 4 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Viaarxiv icon

Multi-branch Collaborative Learning Network for 3D Visual Grounding

Add code
Jul 10, 2024
Viaarxiv icon

Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

Add code
Jun 28, 2024
Figure 1 for Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion
Figure 2 for Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion
Figure 3 for Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion
Figure 4 for Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion
Viaarxiv icon

Local Manifold Learning for No-Reference Image Quality Assessment

Add code
Jun 27, 2024
Viaarxiv icon

Depth-Guided Semi-Supervised Instance Segmentation

Add code
Jun 25, 2024
Figure 1 for Depth-Guided Semi-Supervised Instance Segmentation
Figure 2 for Depth-Guided Semi-Supervised Instance Segmentation
Figure 3 for Depth-Guided Semi-Supervised Instance Segmentation
Figure 4 for Depth-Guided Semi-Supervised Instance Segmentation
Viaarxiv icon

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

Add code
Jun 14, 2024
Viaarxiv icon

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Add code
May 31, 2024
Figure 1 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 2 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 3 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 4 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Viaarxiv icon