Picture for Hoang-Nam Le

Hoang-Nam Le

SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization

Add code
Dec 21, 2024
Viaarxiv icon