Picture for Phu-Vinh Nguyen

Phu-Vinh Nguyen

SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization

Add code
Dec 21, 2024
Viaarxiv icon