Picture for Honglu Zhou

Honglu Zhou

Unifying Specialized Visual Encoders for Video Language Models

Add code
Jan 02, 2025
Viaarxiv icon

ViUniT: Visual Unit Tests for More Robust Visual Programming

Add code
Dec 12, 2024
Viaarxiv icon

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

Add code
Oct 21, 2024
Figure 1 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 2 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 3 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Figure 4 for xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Viaarxiv icon

Domain-Guided Weight Modulation for Semi-Supervised Domain Generalization

Add code
Sep 04, 2024
Viaarxiv icon

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Add code
Aug 22, 2024
Figure 1 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 2 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 3 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Figure 4 for xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Viaarxiv icon

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Add code
Aug 16, 2024
Figure 1 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 2 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 3 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Figure 4 for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Viaarxiv icon

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

Add code
Mar 05, 2024
Viaarxiv icon

Learning from Synthetic Human Group Activities

Add code
Jul 16, 2023
Figure 1 for Learning from Synthetic Human Group Activities
Figure 2 for Learning from Synthetic Human Group Activities
Figure 3 for Learning from Synthetic Human Group Activities
Figure 4 for Learning from Synthetic Human Group Activities
Viaarxiv icon

Procedure-Aware Pretraining for Instructional Video Understanding

Add code
Mar 31, 2023
Figure 1 for Procedure-Aware Pretraining for Instructional Video Understanding
Figure 2 for Procedure-Aware Pretraining for Instructional Video Understanding
Figure 3 for Procedure-Aware Pretraining for Instructional Video Understanding
Figure 4 for Procedure-Aware Pretraining for Instructional Video Understanding
Viaarxiv icon

MSI: Maximize Support-Set Information for Few-Shot Segmentation

Add code
Dec 09, 2022
Viaarxiv icon