Picture for Chunxin Fang

Chunxin Fang

OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

Add code
Jul 06, 2024
Figure 1 for OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Figure 2 for OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Figure 3 for OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Figure 4 for OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Viaarxiv icon

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

Add code
Aug 25, 2023
Viaarxiv icon