Picture for Peihao Chen

Peihao Chen

FlexAttention for Efficient High-Resolution Vision-Language Models

Add code
Jul 29, 2024
Figure 1 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 2 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 3 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 4 for FlexAttention for Efficient High-Resolution Vision-Language Models
Viaarxiv icon

CoNav: A Benchmark for Human-Centered Collaborative Navigation

Add code
Jun 04, 2024
Viaarxiv icon

MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling

Add code
May 22, 2024
Viaarxiv icon

3D-VLA: A 3D Vision-Language-Action Generative World Model

Add code
Mar 14, 2024
Figure 1 for 3D-VLA: A 3D Vision-Language-Action Generative World Model
Figure 2 for 3D-VLA: A 3D Vision-Language-Action Generative World Model
Figure 3 for 3D-VLA: A 3D Vision-Language-Action Generative World Model
Figure 4 for 3D-VLA: A 3D Vision-Language-Action Generative World Model
Viaarxiv icon

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

Add code
Jan 16, 2024
Figure 1 for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Figure 2 for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Figure 3 for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Figure 4 for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Viaarxiv icon

A Simple Knowledge Distillation Framework for Open-world Object Detection

Add code
Dec 14, 2023
Viaarxiv icon

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning

Add code
Dec 10, 2023
Viaarxiv icon

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

Add code
Nov 06, 2023
Viaarxiv icon

FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation

Add code
Oct 11, 2023
Viaarxiv icon

$A^2$Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models

Add code
Aug 15, 2023
Viaarxiv icon