Picture for Li Fei-Fei

Li Fei-Fei

Stanford University

Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation

Add code
Mar 20, 2025
Viaarxiv icon

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

Add code
Mar 14, 2025
Viaarxiv icon

Towards Fine-Grained Video Question Answering

Add code
Mar 10, 2025
Viaarxiv icon

BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities

Add code
Mar 07, 2025
Viaarxiv icon

A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards

Add code
Feb 12, 2025
Viaarxiv icon

s1: Simple test-time scaling

Add code
Jan 31, 2025
Figure 1 for s1: Simple test-time scaling
Figure 2 for s1: Simple test-time scaling
Figure 3 for s1: Simple test-time scaling
Figure 4 for s1: Simple test-time scaling
Viaarxiv icon

Why Automate This? Exploring the Connection between Time Use, Well-being and Robot Automation Across Social Groups

Add code
Jan 10, 2025
Viaarxiv icon

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Add code
Dec 18, 2024
Figure 1 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Figure 2 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Figure 3 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Figure 4 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Viaarxiv icon

The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion

Add code
Dec 13, 2024
Viaarxiv icon

HourVideo: 1-Hour Video-Language Understanding

Add code
Nov 07, 2024
Figure 1 for HourVideo: 1-Hour Video-Language Understanding
Figure 2 for HourVideo: 1-Hour Video-Language Understanding
Figure 3 for HourVideo: 1-Hour Video-Language Understanding
Figure 4 for HourVideo: 1-Hour Video-Language Understanding
Viaarxiv icon