Picture for Long Zhao

Long Zhao

Rutgers University

$ε$-VAE: Denoising as Visual Decoding

Add code
Oct 05, 2024
Viaarxiv icon

Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

Add code
Jul 18, 2024
Viaarxiv icon

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Feb 20, 2024
Figure 1 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 2 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 3 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 4 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Viaarxiv icon

Distilling Vision-Language Models on Millions of Videos

Add code
Jan 11, 2024
Figure 1 for Distilling Vision-Language Models on Millions of Videos
Figure 2 for Distilling Vision-Language Models on Millions of Videos
Figure 3 for Distilling Vision-Language Models on Millions of Videos
Figure 4 for Distilling Vision-Language Models on Millions of Videos
Viaarxiv icon

Generating Enhanced Negatives for Training Language-Based Object Detectors

Add code
Dec 29, 2023
Viaarxiv icon

Deep Deformable Models: Learning 3D Shape Abstractions with Part Consistency

Add code
Sep 02, 2023
Viaarxiv icon

Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition

Add code
Aug 23, 2023
Viaarxiv icon

Improving Pseudo Labels for Open-Vocabulary Object Detection

Add code
Aug 11, 2023
Viaarxiv icon

VideoGLUE: Video General Understanding Evaluation of Foundation Models

Add code
Jul 06, 2023
Viaarxiv icon

Spatiotemporally Discriminative Video-Language Pre-Training with Text Grounding

Add code
Mar 28, 2023
Viaarxiv icon