Picture for Skanda Koppula

Skanda Koppula

Scaling 4D Representations

Add code
Dec 19, 2024
Figure 1 for Scaling 4D Representations
Figure 2 for Scaling 4D Representations
Figure 3 for Scaling 4D Representations
Figure 4 for Scaling 4D Representations
Viaarxiv icon

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Figure 1 for PaliGemma: A versatile 3B VLM for transfer
Figure 2 for PaliGemma: A versatile 3B VLM for transfer
Figure 3 for PaliGemma: A versatile 3B VLM for transfer
Figure 4 for PaliGemma: A versatile 3B VLM for transfer
Viaarxiv icon

TAPVid-3D: A Benchmark for Tracking Any Point in 3D

Add code
Jul 08, 2024
Figure 1 for TAPVid-3D: A Benchmark for Tracking Any Point in 3D
Figure 2 for TAPVid-3D: A Benchmark for Tracking Any Point in 3D
Figure 3 for TAPVid-3D: A Benchmark for Tracking Any Point in 3D
Figure 4 for TAPVid-3D: A Benchmark for Tracking Any Point in 3D
Viaarxiv icon

Memory Consolidation Enables Long-Context Video Understanding

Add code
Feb 08, 2024
Figure 1 for Memory Consolidation Enables Long-Context Video Understanding
Figure 2 for Memory Consolidation Enables Long-Context Video Understanding
Figure 3 for Memory Consolidation Enables Long-Context Video Understanding
Figure 4 for Memory Consolidation Enables Long-Context Video Understanding
Viaarxiv icon

BootsTAP: Bootstrapped Training for Tracking-Any-Point

Add code
Feb 01, 2024
Figure 1 for BootsTAP: Bootstrapped Training for Tracking-Any-Point
Figure 2 for BootsTAP: Bootstrapped Training for Tracking-Any-Point
Figure 3 for BootsTAP: Bootstrapped Training for Tracking-Any-Point
Figure 4 for BootsTAP: Bootstrapped Training for Tracking-Any-Point
Viaarxiv icon

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

Add code
Dec 12, 2023
Viaarxiv icon

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Add code
May 23, 2023
Figure 1 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Figure 2 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Figure 3 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Figure 4 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Viaarxiv icon

Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

Add code
Apr 13, 2023
Viaarxiv icon

Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods

Add code
Oct 06, 2022
Figure 1 for Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods
Figure 2 for Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods
Figure 3 for Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods
Figure 4 for Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods
Viaarxiv icon

Object discovery and representation networks

Add code
Mar 16, 2022
Figure 1 for Object discovery and representation networks
Figure 2 for Object discovery and representation networks
Figure 3 for Object discovery and representation networks
Figure 4 for Object discovery and representation networks
Viaarxiv icon