Picture for Shuai Bai

Shuai Bai

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

Add code
Dec 03, 2024
Viaarxiv icon

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Add code
Sep 18, 2024
Viaarxiv icon

Qwen2 Technical Report

Add code
Jul 16, 2024
Figure 1 for Qwen2 Technical Report
Figure 2 for Qwen2 Technical Report
Figure 3 for Qwen2 Technical Report
Figure 4 for Qwen2 Technical Report
Viaarxiv icon

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Add code
Mar 11, 2024
Viaarxiv icon

GD^2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

Add code
Jan 02, 2024
Viaarxiv icon

Qwen Technical Report

Add code
Sep 28, 2023
Figure 1 for Qwen Technical Report
Figure 2 for Qwen Technical Report
Figure 3 for Qwen Technical Report
Figure 4 for Qwen Technical Report
Viaarxiv icon

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Add code
Sep 14, 2023
Viaarxiv icon

TouchStone: Evaluating Vision-Language Models by Language Models

Add code
Sep 04, 2023
Viaarxiv icon

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Add code
May 18, 2023
Viaarxiv icon

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

Add code
Dec 08, 2022
Viaarxiv icon