Picture for Li Erran Li

Li Erran Li

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

Add code
Oct 09, 2024
Figure 1 for Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
Figure 2 for Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
Figure 3 for Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
Figure 4 for Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
Viaarxiv icon

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

Add code
May 17, 2024
Figure 1 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Figure 2 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Figure 3 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Figure 4 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Viaarxiv icon

Language Models Can Reduce Asymmetry in Information Markets

Add code
Mar 21, 2024
Viaarxiv icon

Compositional 3D Scene Synthesis with Scene Graph Guided Layout-Shape Generation

Add code
Mar 19, 2024
Viaarxiv icon

Learning 3D object-centric representation through prediction

Add code
Mar 06, 2024
Viaarxiv icon

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

Add code
Feb 09, 2024
Viaarxiv icon

AffordanceLLM: Grounding Affordance from Vision Language Models

Add code
Jan 12, 2024
Figure 1 for AffordanceLLM: Grounding Affordance from Vision Language Models
Figure 2 for AffordanceLLM: Grounding Affordance from Vision Language Models
Figure 3 for AffordanceLLM: Grounding Affordance from Vision Language Models
Figure 4 for AffordanceLLM: Grounding Affordance from Vision Language Models
Viaarxiv icon

The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models

Add code
Oct 04, 2023
Viaarxiv icon

DAT++: Spatially Dynamic Vision Transformer with Deformable Attention

Add code
Sep 04, 2023
Figure 1 for DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Figure 2 for DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Figure 3 for DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Figure 4 for DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Viaarxiv icon

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

Add code
Sep 01, 2023
Viaarxiv icon