Picture for Xingyi Zhou

Xingyi Zhou

Neptune: The Long Orbit to Benchmarking Long Video Understanding

Add code
Dec 12, 2024
Viaarxiv icon

Visual Lexicon: Rich Image Features in Language Space

Add code
Dec 09, 2024
Viaarxiv icon

STT: Stateful Tracking with Transformers for Autonomous Driving

Add code
Apr 30, 2024
Viaarxiv icon

Streaming Dense Video Captioning

Add code
Apr 01, 2024
Viaarxiv icon

Distilling Vision-Language Models on Millions of Videos

Add code
Jan 11, 2024
Figure 1 for Distilling Vision-Language Models on Millions of Videos
Figure 2 for Distilling Vision-Language Models on Millions of Videos
Figure 3 for Distilling Vision-Language Models on Millions of Videos
Figure 4 for Distilling Vision-Language Models on Millions of Videos
Viaarxiv icon

Pixel Aligned Language Models

Add code
Dec 14, 2023
Viaarxiv icon

MaskConver: Revisiting Pure Convolution Model for Panoptic Segmentation

Add code
Dec 11, 2023
Viaarxiv icon

Does Visual Pretraining Help End-to-End Reasoning?

Add code
Jul 17, 2023
Viaarxiv icon

How can objects help action recognition?

Add code
Jun 20, 2023
Viaarxiv icon

Dense Video Object Captioning from Disjoint Supervision

Add code
Jun 20, 2023
Viaarxiv icon