Picture for Kumara Kahatapitiya

Kumara Kahatapitiya

Adaptive Caching for Faster Video Generation with Diffusion Transformers

Add code
Nov 04, 2024
Figure 1 for Adaptive Caching for Faster Video Generation with Diffusion Transformers
Figure 2 for Adaptive Caching for Faster Video Generation with Diffusion Transformers
Figure 3 for Adaptive Caching for Faster Video Generation with Diffusion Transformers
Figure 4 for Adaptive Caching for Faster Video Generation with Diffusion Transformers
Viaarxiv icon

MarDini: Masked Autoregressive Diffusion for Video Generation at Scale

Add code
Oct 26, 2024
Figure 1 for MarDini: Masked Autoregressive Diffusion for Video Generation at Scale
Figure 2 for MarDini: Masked Autoregressive Diffusion for Video Generation at Scale
Figure 3 for MarDini: Masked Autoregressive Diffusion for Video Generation at Scale
Figure 4 for MarDini: Masked Autoregressive Diffusion for Video Generation at Scale
Viaarxiv icon

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Add code
Jun 28, 2024
Figure 1 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 2 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 3 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 4 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Viaarxiv icon

Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA

Add code
Jun 17, 2024
Figure 1 for Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA
Figure 2 for Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA
Figure 3 for Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA
Figure 4 for Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA
Viaarxiv icon

Understanding Long Videos in One Multimodal Language Model Pass

Add code
Mar 25, 2024
Figure 1 for Understanding Long Videos in One Multimodal Language Model Pass
Figure 2 for Understanding Long Videos in One Multimodal Language Model Pass
Figure 3 for Understanding Long Videos in One Multimodal Language Model Pass
Figure 4 for Understanding Long Videos in One Multimodal Language Model Pass
Viaarxiv icon

Language Repository for Long Video Understanding

Add code
Mar 21, 2024
Figure 1 for Language Repository for Long Video Understanding
Figure 2 for Language Repository for Long Video Understanding
Figure 3 for Language Repository for Long Video Understanding
Figure 4 for Language Repository for Long Video Understanding
Viaarxiv icon

Object-Centric Diffusion for Efficient Video Editing

Add code
Jan 11, 2024
Figure 1 for Object-Centric Diffusion for Efficient Video Editing
Figure 2 for Object-Centric Diffusion for Efficient Video Editing
Figure 3 for Object-Centric Diffusion for Efficient Video Editing
Figure 4 for Object-Centric Diffusion for Efficient Video Editing
Viaarxiv icon

VicTR: Video-conditioned Text Representations for Activity Recognition

Add code
Apr 05, 2023
Viaarxiv icon

Token Turing Machines

Add code
Nov 16, 2022
Figure 1 for Token Turing Machines
Figure 2 for Token Turing Machines
Figure 3 for Token Turing Machines
Figure 4 for Token Turing Machines
Viaarxiv icon

Grafting Vision Transformers

Add code
Oct 28, 2022
Figure 1 for Grafting Vision Transformers
Figure 2 for Grafting Vision Transformers
Figure 3 for Grafting Vision Transformers
Figure 4 for Grafting Vision Transformers
Viaarxiv icon