Picture for Ruipu Luo

Ruipu Luo

Valley2: Exploring Multimodal Models with Scalable Vision-Language Design

Add code
Jan 13, 2025
Viaarxiv icon

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

Add code
May 28, 2024
Figure 1 for VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Figure 2 for VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Figure 3 for VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Figure 4 for VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Viaarxiv icon

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning

Add code
Apr 02, 2024
Viaarxiv icon

Breaking Down the Task: A Unit-Grained Hybrid Training Framework for Vision and Language Decision Making

Add code
Jul 16, 2023
Viaarxiv icon

Valley: Video Assistant with Large Language model Enhanced abilitY

Add code
Jun 12, 2023
Viaarxiv icon