Picture for Reuben Tan

Reuben Tan

Learning Sparse Visual Representations via Spatial-Semantic Factorization

Add code
Feb 02, 2026
Viaarxiv icon

VideoWeave: A Data-Centric Approach for Efficient Video Understanding

Add code
Jan 09, 2026
Viaarxiv icon

SITE: towards Spatial Intelligence Thorough Evaluation

Add code
May 08, 2025
Viaarxiv icon

Magma: A Foundation Model for Multimodal AI Agents

Add code
Feb 18, 2025
Viaarxiv icon

SAT: Spatial Aptitude Training for Multimodal Language Models

Add code
Dec 10, 2024
Figure 1 for SAT: Spatial Aptitude Training for Multimodal Language Models
Figure 2 for SAT: Spatial Aptitude Training for Multimodal Language Models
Figure 3 for SAT: Spatial Aptitude Training for Multimodal Language Models
Figure 4 for SAT: Spatial Aptitude Training for Multimodal Language Models
Viaarxiv icon

Latent Action Pretraining from Videos

Add code
Oct 15, 2024
Figure 1 for Latent Action Pretraining from Videos
Figure 2 for Latent Action Pretraining from Videos
Figure 3 for Latent Action Pretraining from Videos
Figure 4 for Latent Action Pretraining from Videos
Viaarxiv icon

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Add code
Oct 15, 2024
Figure 1 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 2 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 3 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 4 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Viaarxiv icon

Koala: Key frame-conditioned long video-LLM

Add code
Apr 05, 2024
Figure 1 for Koala: Key frame-conditioned long video-LLM
Figure 2 for Koala: Key frame-conditioned long video-LLM
Figure 3 for Koala: Key frame-conditioned long video-LLM
Figure 4 for Koala: Key frame-conditioned long video-LLM
Viaarxiv icon

Socratis: Are large multimodal models emotionally aware?

Add code
Sep 05, 2023
Figure 1 for Socratis: Are large multimodal models emotionally aware?
Figure 2 for Socratis: Are large multimodal models emotionally aware?
Figure 3 for Socratis: Are large multimodal models emotionally aware?
Figure 4 for Socratis: Are large multimodal models emotionally aware?
Viaarxiv icon

Multiscale Video Pretraining for Long-Term Activity Forecasting

Add code
Jul 24, 2023
Viaarxiv icon