Picture for Long Zhao

Long Zhao

Rutgers University

The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering

Add code
Feb 05, 2025
Figure 1 for The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
Figure 2 for The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
Figure 3 for The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
Figure 4 for The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
Viaarxiv icon

Video Creation by Demonstration

Add code
Dec 12, 2024
Figure 1 for Video Creation by Demonstration
Figure 2 for Video Creation by Demonstration
Figure 3 for Video Creation by Demonstration
Figure 4 for Video Creation by Demonstration
Viaarxiv icon

$ε$-VAE: Denoising as Visual Decoding

Add code
Oct 05, 2024
Viaarxiv icon

Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

Add code
Jul 18, 2024
Figure 1 for Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Figure 2 for Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Figure 3 for Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Figure 4 for Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Viaarxiv icon

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Feb 20, 2024
Figure 1 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 2 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 3 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 4 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Viaarxiv icon

Distilling Vision-Language Models on Millions of Videos

Add code
Jan 11, 2024
Figure 1 for Distilling Vision-Language Models on Millions of Videos
Figure 2 for Distilling Vision-Language Models on Millions of Videos
Figure 3 for Distilling Vision-Language Models on Millions of Videos
Figure 4 for Distilling Vision-Language Models on Millions of Videos
Viaarxiv icon

Generating Enhanced Negatives for Training Language-Based Object Detectors

Add code
Dec 29, 2023
Viaarxiv icon

Deep Deformable Models: Learning 3D Shape Abstractions with Part Consistency

Add code
Sep 02, 2023
Viaarxiv icon

Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition

Add code
Aug 23, 2023
Viaarxiv icon

Improving Pseudo Labels for Open-Vocabulary Object Detection

Add code
Aug 11, 2023
Viaarxiv icon