Picture for Mengdan Zhang

Mengdan Zhang

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

Add code
Jun 14, 2024
Viaarxiv icon

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Add code
May 31, 2024
Figure 1 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 2 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 3 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 4 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Viaarxiv icon

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Add code
Apr 24, 2024
Viaarxiv icon

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Add code
Dec 20, 2023
Figure 1 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Figure 2 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Figure 3 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Figure 4 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Viaarxiv icon

Aligning and Prompting Everything All at Once for Universal Visual Perception

Add code
Dec 04, 2023
Viaarxiv icon

Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection

Add code
Aug 30, 2023
Viaarxiv icon

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Add code
Jul 02, 2023
Viaarxiv icon

Multi-modal Queried Object Detection in the Wild

Add code
May 30, 2023
Viaarxiv icon

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization

Add code
Jun 24, 2022
Figure 1 for Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Figure 2 for Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Figure 3 for Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Figure 4 for Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Viaarxiv icon

Efficient Decoder-free Object Detection with Transformers

Add code
Jun 17, 2022
Figure 1 for Efficient Decoder-free Object Detection with Transformers
Figure 2 for Efficient Decoder-free Object Detection with Transformers
Figure 3 for Efficient Decoder-free Object Detection with Transformers
Figure 4 for Efficient Decoder-free Object Detection with Transformers
Viaarxiv icon