Picture for Junbo Cui

Junbo Cui

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Add code
Oct 14, 2024
Figure 1 for VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Figure 2 for VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Figure 3 for VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Figure 4 for VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Viaarxiv icon

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Add code
Aug 03, 2024
Figure 1 for MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Figure 2 for MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Figure 3 for MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Figure 4 for MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Viaarxiv icon

GUICourse: From General Vision Language Models to Versatile GUI Agents

Add code
Jun 17, 2024
Figure 1 for GUICourse: From General Vision Language Models to Versatile GUI Agents
Figure 2 for GUICourse: From General Vision Language Models to Versatile GUI Agents
Figure 3 for GUICourse: From General Vision Language Models to Versatile GUI Agents
Figure 4 for GUICourse: From General Vision Language Models to Versatile GUI Agents
Viaarxiv icon

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Add code
Mar 18, 2024
Viaarxiv icon