Picture for Baifeng Shi

Baifeng Shi

NVILA: Efficient Frontier Visual Language Models

Add code
Dec 05, 2024
Figure 1 for NVILA: Efficient Frontier Visual Language Models
Figure 2 for NVILA: Efficient Frontier Visual Language Models
Figure 3 for NVILA: Efficient Frontier Visual Language Models
Figure 4 for NVILA: Efficient Frontier Visual Language Models
Viaarxiv icon

LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning

Add code
Jun 17, 2024
Figure 1 for LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Figure 2 for LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Figure 3 for LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Figure 4 for LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Viaarxiv icon

When Do We Not Need Larger Vision Models?

Add code
Mar 19, 2024
Viaarxiv icon

Humanoid Locomotion as Next Token Prediction

Add code
Feb 29, 2024
Viaarxiv icon

Rethinking Patch Dependence for Masked Autoencoders

Add code
Jan 25, 2024
Viaarxiv icon

Recursive Visual Programming

Add code
Dec 04, 2023
Viaarxiv icon

LLM-grounded Video Diffusion Models

Add code
Oct 02, 2023
Viaarxiv icon

Robot Learning with Sensorimotor Pre-training

Add code
Jun 16, 2023
Viaarxiv icon

Refocusing Is Key to Transfer Learning

Add code
May 24, 2023
Viaarxiv icon

Top-Down Visual Attention from Analysis by Synthesis

Add code
Mar 24, 2023
Viaarxiv icon