Picture for Chuofan Ma

Chuofan Ma

Liquid: Language Models are Scalable Multi-modal Generators

Add code
Dec 05, 2024
Figure 1 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 2 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 3 for Liquid: Language Models are Scalable Multi-modal Generators
Figure 4 for Liquid: Language Models are Scalable Multi-modal Generators
Viaarxiv icon

Granularity Matters in Long-Tail Learning

Add code
Oct 21, 2024
Figure 1 for Granularity Matters in Long-Tail Learning
Figure 2 for Granularity Matters in Long-Tail Learning
Figure 3 for Granularity Matters in Long-Tail Learning
Figure 4 for Granularity Matters in Long-Tail Learning
Viaarxiv icon

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Add code
Apr 19, 2024
Viaarxiv icon

Recognize Any Regions

Add code
Nov 02, 2023
Viaarxiv icon

CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection

Add code
Oct 25, 2023
Viaarxiv icon

EGC: Image Generation and Classification via a Diffusion Energy-Based Model

Add code
Apr 13, 2023
Viaarxiv icon

Rethinking Resolution in the Context of Efficient Video Recognition

Add code
Sep 26, 2022
Figure 1 for Rethinking Resolution in the Context of Efficient Video Recognition
Figure 2 for Rethinking Resolution in the Context of Efficient Video Recognition
Figure 3 for Rethinking Resolution in the Context of Efficient Video Recognition
Figure 4 for Rethinking Resolution in the Context of Efficient Video Recognition
Viaarxiv icon