Picture for Shen Yan

Shen Yan

CompCap: Improving Multimodal Large Language Models with Composite Captions

Add code
Dec 06, 2024
Viaarxiv icon

Autoregressive Models in Vision: A Survey

Add code
Nov 08, 2024
Figure 1 for Autoregressive Models in Vision: A Survey
Figure 2 for Autoregressive Models in Vision: A Survey
Figure 3 for Autoregressive Models in Vision: A Survey
Figure 4 for Autoregressive Models in Vision: A Survey
Viaarxiv icon

LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment

Add code
Oct 16, 2024
Viaarxiv icon

Streaming Dense Video Captioning

Add code
Apr 01, 2024
Viaarxiv icon

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Feb 20, 2024
Figure 1 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 2 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 3 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 4 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Viaarxiv icon

PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter

Add code
Feb 16, 2024
Viaarxiv icon

UAVD4L: A Large-Scale Dataset for UAV 6-DoF Localization

Add code
Jan 11, 2024
Viaarxiv icon

Efficient Large Language Models: A Survey

Add code
Dec 23, 2023
Viaarxiv icon

Pixel Aligned Language Models

Add code
Dec 14, 2023
Viaarxiv icon

UnLoc: A Unified Framework for Video Localization Tasks

Add code
Aug 21, 2023
Viaarxiv icon