Picture for Rohit Girdhar

Rohit Girdhar

Jack

Diffusion Autoencoders are Scalable Image Tokenizers

Add code
Jan 30, 2025
Viaarxiv icon

LLMs can see and hear without any training

Add code
Jan 30, 2025
Viaarxiv icon

MotiF: Making Text Count in Image Animation with Motion Focal Loss

Add code
Dec 20, 2024
Viaarxiv icon

Human Action Anticipation: A Survey

Add code
Oct 17, 2024
Figure 1 for Human Action Anticipation: A Survey
Figure 2 for Human Action Anticipation: A Survey
Figure 3 for Human Action Anticipation: A Survey
Figure 4 for Human Action Anticipation: A Survey
Viaarxiv icon

Movie Gen: A Cast of Media Foundation Models

Add code
Oct 17, 2024
Figure 1 for Movie Gen: A Cast of Media Foundation Models
Figure 2 for Movie Gen: A Cast of Media Foundation Models
Figure 3 for Movie Gen: A Cast of Media Foundation Models
Figure 4 for Movie Gen: A Cast of Media Foundation Models
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos

Add code
Apr 08, 2024
Figure 1 for SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Figure 2 for SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Figure 3 for SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Figure 4 for SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Viaarxiv icon

InstanceDiffusion: Instance-level Control for Image Generation

Add code
Feb 05, 2024
Viaarxiv icon

Generating Illustrated Instructions

Add code
Dec 07, 2023
Viaarxiv icon

Motion-Conditioned Image Animation for Video Editing

Add code
Nov 30, 2023
Figure 1 for Motion-Conditioned Image Animation for Video Editing
Figure 2 for Motion-Conditioned Image Animation for Video Editing
Figure 3 for Motion-Conditioned Image Animation for Video Editing
Figure 4 for Motion-Conditioned Image Animation for Video Editing
Viaarxiv icon