Picture for Hartwig Adam

Hartwig Adam

$ε$-VAE: Denoising as Visual Decoding

Add code
Oct 05, 2024
Viaarxiv icon

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Feb 20, 2024
Figure 1 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 2 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 3 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 4 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Viaarxiv icon

Distilling Vision-Language Models on Millions of Videos

Add code
Jan 11, 2024
Figure 1 for Distilling Vision-Language Models on Millions of Videos
Figure 2 for Distilling Vision-Language Models on Millions of Videos
Figure 3 for Distilling Vision-Language Models on Millions of Videos
Figure 4 for Distilling Vision-Language Models on Millions of Videos
Viaarxiv icon

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Add code
Dec 21, 2023
Viaarxiv icon

PolyMaX: General Dense Prediction with Mask Transformer

Add code
Nov 09, 2023
Figure 1 for PolyMaX: General Dense Prediction with Mask Transformer
Figure 2 for PolyMaX: General Dense Prediction with Mask Transformer
Figure 3 for PolyMaX: General Dense Prediction with Mask Transformer
Figure 4 for PolyMaX: General Dense Prediction with Mask Transformer
Viaarxiv icon

SANPO: A Scene Understanding, Accessibility, Navigation, Pathfinding, Obstacle Avoidance Dataset

Add code
Sep 21, 2023
Figure 1 for SANPO: A Scene Understanding, Accessibility, Navigation, Pathfinding, Obstacle Avoidance Dataset
Figure 2 for SANPO: A Scene Understanding, Accessibility, Navigation, Pathfinding, Obstacle Avoidance Dataset
Figure 3 for SANPO: A Scene Understanding, Accessibility, Navigation, Pathfinding, Obstacle Avoidance Dataset
Figure 4 for SANPO: A Scene Understanding, Accessibility, Navigation, Pathfinding, Obstacle Avoidance Dataset
Viaarxiv icon

VideoGLUE: Video General Understanding Evaluation of Foundation Models

Add code
Jul 06, 2023
Viaarxiv icon

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

Add code
May 10, 2023
Viaarxiv icon

Unified Visual Relationship Detection with Vision and Language Models

Add code
Mar 16, 2023
Viaarxiv icon

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Add code
Dec 04, 2022
Viaarxiv icon