Picture for Li Fei-Fei

Li Fei-Fei

Stanford University

Re-thinking Temporal Search for Long-Form Video Understanding

Add code
Apr 03, 2025
Viaarxiv icon

WorldScore: A Unified Evaluation Benchmark for World Generation

Add code
Apr 01, 2025
Viaarxiv icon

Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation

Add code
Mar 20, 2025
Viaarxiv icon

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

Add code
Mar 14, 2025
Viaarxiv icon

Towards Fine-Grained Video Question Answering

Add code
Mar 10, 2025
Viaarxiv icon

BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities

Add code
Mar 07, 2025
Viaarxiv icon

A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards

Add code
Feb 12, 2025
Viaarxiv icon

s1: Simple test-time scaling

Add code
Jan 31, 2025
Figure 1 for s1: Simple test-time scaling
Figure 2 for s1: Simple test-time scaling
Figure 3 for s1: Simple test-time scaling
Figure 4 for s1: Simple test-time scaling
Viaarxiv icon

Why Automate This? Exploring the Connection between Time Use, Well-being and Robot Automation Across Social Groups

Add code
Jan 10, 2025
Viaarxiv icon

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Add code
Dec 18, 2024
Figure 1 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Figure 2 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Figure 3 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Figure 4 for Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Viaarxiv icon