Picture for Bo Zhao

Bo Zhao

MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding

Add code
Jun 06, 2024
Figure 1 for MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding
Figure 2 for MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding
Figure 3 for MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding
Figure 4 for MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding
Viaarxiv icon

VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval

Add code
Jun 06, 2024
Figure 1 for VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
Figure 2 for VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
Figure 3 for VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
Figure 4 for VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
Viaarxiv icon

Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

Add code
Jun 06, 2024
Viaarxiv icon

The SkatingVerse Workshop & Challenge: Methods and Results

Add code
May 27, 2024
Figure 1 for The SkatingVerse Workshop & Challenge: Methods and Results
Figure 2 for The SkatingVerse Workshop & Challenge: Methods and Results
Viaarxiv icon

VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

Add code
May 22, 2024
Figure 1 for VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
Figure 2 for VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
Figure 3 for VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
Figure 4 for VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
Viaarxiv icon

Efficient Multimodal Large Language Models: A Survey

Add code
May 17, 2024
Figure 1 for Efficient Multimodal Large Language Models: A Survey
Figure 2 for Efficient Multimodal Large Language Models: A Survey
Figure 3 for Efficient Multimodal Large Language Models: A Survey
Figure 4 for Efficient Multimodal Large Language Models: A Survey
Viaarxiv icon

Understanding the Difficulty of Solving Cauchy Problems with PINNs

Add code
May 04, 2024
Viaarxiv icon

Advances and Open Challenges in Federated Learning with Foundation Models

Add code
Apr 29, 2024
Figure 1 for Advances and Open Challenges in Federated Learning with Foundation Models
Figure 2 for Advances and Open Challenges in Federated Learning with Foundation Models
Figure 3 for Advances and Open Challenges in Federated Learning with Foundation Models
Figure 4 for Advances and Open Challenges in Federated Learning with Foundation Models
Viaarxiv icon

Tele-FLM Technical Report

Add code
Apr 25, 2024
Figure 1 for Tele-FLM Technical Report
Figure 2 for Tele-FLM Technical Report
Figure 3 for Tele-FLM Technical Report
Figure 4 for Tele-FLM Technical Report
Viaarxiv icon

M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models

Add code
Mar 31, 2024
Figure 1 for M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models
Figure 2 for M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models
Figure 3 for M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models
Figure 4 for M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models
Viaarxiv icon