Picture for Willi Menapace

Willi Menapace

AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation

Add code
Nov 07, 2024
Viaarxiv icon

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

Add code
Jul 17, 2024
Viaarxiv icon

VIMI: Grounding Video Generation through Multi-modal Instruction

Add code
Jul 08, 2024
Figure 1 for VIMI: Grounding Video Generation through Multi-modal Instruction
Figure 2 for VIMI: Grounding Video Generation through Multi-modal Instruction
Figure 3 for VIMI: Grounding Video Generation through Multi-modal Instruction
Figure 4 for VIMI: Grounding Video Generation through Multi-modal Instruction
Viaarxiv icon

Taming Data and Transformers for Audio Generation

Add code
Jun 27, 2024
Viaarxiv icon

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Add code
Jun 12, 2024
Viaarxiv icon

4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

Add code
Jun 11, 2024
Figure 1 for 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Figure 2 for 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Figure 3 for 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Figure 4 for 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Viaarxiv icon

SF-V: Single Forward Video Generation Model

Add code
Jun 06, 2024
Viaarxiv icon

Harnessing Large Language Models for Training-free Video Anomaly Detection

Add code
Apr 01, 2024
Viaarxiv icon

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Add code
Feb 29, 2024
Viaarxiv icon

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Add code
Feb 22, 2024
Figure 1 for Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Figure 2 for Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Figure 3 for Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Figure 4 for Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Viaarxiv icon