Picture for Peixian Chen

Peixian Chen

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

Add code
Jun 14, 2024
Viaarxiv icon

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Add code
May 31, 2024
Viaarxiv icon

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Add code
Apr 24, 2024
Viaarxiv icon

SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation

Add code
Apr 04, 2024
Viaarxiv icon

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Add code
Dec 20, 2023
Viaarxiv icon

Aligning and Prompting Everything All at Once for Universal Visual Perception

Add code
Dec 04, 2023
Viaarxiv icon

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Add code
Jul 02, 2023
Viaarxiv icon

Multi-modal Queried Object Detection in the Wild

Add code
May 30, 2023
Viaarxiv icon

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization

Add code
Jun 24, 2022
Figure 1 for Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Figure 2 for Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Figure 3 for Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Figure 4 for Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Viaarxiv icon

Efficient Decoder-free Object Detection with Transformers

Add code
Jun 17, 2022
Figure 1 for Efficient Decoder-free Object Detection with Transformers
Figure 2 for Efficient Decoder-free Object Detection with Transformers
Figure 3 for Efficient Decoder-free Object Detection with Transformers
Figure 4 for Efficient Decoder-free Object Detection with Transformers
Viaarxiv icon