Picture for Chaoyou Fu

Chaoyou Fu

MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning

Add code
Nov 05, 2024
Viaarxiv icon

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Add code
Nov 01, 2024
Viaarxiv icon

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Add code
Aug 23, 2024
Figure 1 for MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Figure 2 for MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Figure 3 for MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Figure 4 for MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Viaarxiv icon

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Add code
Aug 09, 2024
Figure 1 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 2 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 3 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 4 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Viaarxiv icon

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

Add code
Jun 14, 2024
Viaarxiv icon

Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

Add code
Jun 12, 2024
Viaarxiv icon

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Add code
May 31, 2024
Viaarxiv icon

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Add code
Apr 24, 2024
Viaarxiv icon

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

Add code
Apr 05, 2024
Viaarxiv icon

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Add code
Dec 20, 2023
Viaarxiv icon