Picture for Kumara Kahatapitiya

Kumara Kahatapitiya

Adaptive Caching for Faster Video Generation with Diffusion Transformers

Add code
Nov 04, 2024
Viaarxiv icon

MarDini: Masked Autoregressive Diffusion for Video Generation at Scale

Add code
Oct 26, 2024
Viaarxiv icon

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Add code
Jun 28, 2024
Figure 1 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 2 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 3 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 4 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Viaarxiv icon

Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA

Add code
Jun 17, 2024
Viaarxiv icon

Understanding Long Videos in One Multimodal Language Model Pass

Add code
Mar 25, 2024
Viaarxiv icon

Language Repository for Long Video Understanding

Add code
Mar 21, 2024
Viaarxiv icon

Object-Centric Diffusion for Efficient Video Editing

Add code
Jan 11, 2024
Viaarxiv icon

VicTR: Video-conditioned Text Representations for Activity Recognition

Add code
Apr 05, 2023
Viaarxiv icon

Token Turing Machines

Add code
Nov 16, 2022
Viaarxiv icon

Grafting Vision Transformers

Add code
Oct 28, 2022
Viaarxiv icon