Picture for Ruipu Luo

Ruipu Luo

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

Add code
May 28, 2024
Viaarxiv icon

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning

Add code
Apr 02, 2024
Viaarxiv icon

Breaking Down the Task: A Unit-Grained Hybrid Training Framework for Vision and Language Decision Making

Add code
Jul 16, 2023
Viaarxiv icon

Valley: Video Assistant with Large Language model Enhanced abilitY

Add code
Jun 12, 2023
Viaarxiv icon