Picture for Gedas Bertasius

Gedas Bertasius

ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis

Add code
Mar 15, 2025
Viaarxiv icon

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering

Add code
Mar 13, 2025
Viaarxiv icon

BOSS: Benchmark for Observation Space Shift in Long-Horizon Task

Add code
Feb 21, 2025
Viaarxiv icon

TimeRefine: Temporal Grounding with Time Refining Video LLM

Add code
Dec 12, 2024
Figure 1 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Figure 2 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Figure 3 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Figure 4 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Viaarxiv icon

ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos

Add code
Nov 22, 2024
Figure 1 for ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Figure 2 for ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Figure 3 for ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Figure 4 for ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Viaarxiv icon

ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning

Add code
Oct 21, 2024
Figure 1 for ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning
Figure 2 for ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning
Figure 3 for ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning
Figure 4 for ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning
Viaarxiv icon

Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos

Add code
Sep 30, 2024
Figure 1 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Figure 2 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Figure 3 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Figure 4 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Viaarxiv icon

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

Add code
Sep 11, 2024
Viaarxiv icon

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

Add code
May 29, 2024
Figure 1 for VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Figure 2 for VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Figure 3 for VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Figure 4 for VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Viaarxiv icon

Siamese Vision Transformers are Scalable Audio-visual Learners

Add code
Mar 28, 2024
Viaarxiv icon