Picture for Reuben Tan

Reuben Tan

SAT: Spatial Aptitude Training for Multimodal Language Models

Add code
Dec 10, 2024
Viaarxiv icon

Latent Action Pretraining from Videos

Add code
Oct 15, 2024
Figure 1 for Latent Action Pretraining from Videos
Figure 2 for Latent Action Pretraining from Videos
Figure 3 for Latent Action Pretraining from Videos
Figure 4 for Latent Action Pretraining from Videos
Viaarxiv icon

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Add code
Oct 15, 2024
Figure 1 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 2 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 3 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 4 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Viaarxiv icon

Koala: Key frame-conditioned long video-LLM

Add code
Apr 05, 2024
Viaarxiv icon

Socratis: Are large multimodal models emotionally aware?

Add code
Sep 05, 2023
Figure 1 for Socratis: Are large multimodal models emotionally aware?
Figure 2 for Socratis: Are large multimodal models emotionally aware?
Figure 3 for Socratis: Are large multimodal models emotionally aware?
Figure 4 for Socratis: Are large multimodal models emotionally aware?
Viaarxiv icon

Multiscale Video Pretraining for Long-Term Activity Forecasting

Add code
Jul 24, 2023
Viaarxiv icon

EgoAdapt: A multi-stream evaluation study of adaptation to real-world egocentric user video

Add code
Jul 11, 2023
Viaarxiv icon

Language-Guided Audio-Visual Source Separation via Trimodal Consistency

Add code
Mar 28, 2023
Viaarxiv icon

NewsStories: Illustrating articles with visual summaries

Add code
Aug 14, 2022
Figure 1 for NewsStories: Illustrating articles with visual summaries
Figure 2 for NewsStories: Illustrating articles with visual summaries
Figure 3 for NewsStories: Illustrating articles with visual summaries
Viaarxiv icon

Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

Add code
Oct 20, 2021
Figure 1 for Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos
Figure 2 for Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos
Figure 3 for Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos
Figure 4 for Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos
Viaarxiv icon