Picture for Haojia Lin

Haojia Lin

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Add code
Jan 03, 2025
Viaarxiv icon

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

Add code
Nov 20, 2024
Figure 1 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Figure 2 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Figure 3 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Figure 4 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Viaarxiv icon

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Add code
Aug 09, 2024
Figure 1 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 2 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 3 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 4 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Viaarxiv icon

Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization

Add code
Apr 17, 2024
Viaarxiv icon

A Unified Framework for 3D Point Cloud Visual Grounding

Add code
Aug 23, 2023
Viaarxiv icon

Meta Architecure for Point Cloud Analysis

Add code
Nov 26, 2022
Viaarxiv icon