Picture for Yanyuan Qiao

Yanyuan Qiao

MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation

Add code
Oct 02, 2024
Viaarxiv icon

Effective Tuning Strategies for Generalist Robot Manipulation Policies

Add code
Oct 02, 2024
Viaarxiv icon

MiniVLN: Efficient Vision-and-Language Navigation by Progressive Knowledge Distillation

Add code
Sep 27, 2024
Viaarxiv icon

Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs

Add code
Sep 27, 2024
Viaarxiv icon

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

Add code
Jul 09, 2024
Viaarxiv icon

VL-Mamba: Exploring State Space Models for Multimodal Learning

Add code
Mar 20, 2024
Figure 1 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 2 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 3 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 4 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Viaarxiv icon

Improving Online Source-free Domain Adaptation for Object Detection by Unsupervised Data Acquisition

Add code
Oct 30, 2023
Viaarxiv icon

March in Chat: Interactive Prompting for Remote Embodied Referring Expression

Add code
Aug 20, 2023
Viaarxiv icon

VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation

Add code
Aug 20, 2023
Viaarxiv icon

HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation

Add code
Mar 22, 2022
Figure 1 for HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation
Figure 2 for HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation
Figure 3 for HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation
Figure 4 for HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation
Viaarxiv icon