Picture for Wentao Mo

Wentao Mo

Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs

Add code
Jan 02, 2025
Viaarxiv icon

3D Vision and Language Pretraining with Large-Scale Synthetic Data

Add code
Jul 08, 2024
Viaarxiv icon

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

Add code
Feb 24, 2024
Viaarxiv icon