Picture for Ziyu Wei

Ziyu Wei

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

Add code
Jan 14, 2025
Viaarxiv icon