Picture for Longtian Qiu

Longtian Qiu

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Add code
May 09, 2024
Figure 1 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 2 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 3 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 4 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Viaarxiv icon

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Add code
Feb 08, 2024
Viaarxiv icon

Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

Add code
Jan 04, 2024
Viaarxiv icon

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Add code
Dec 20, 2023
Figure 1 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Figure 2 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Figure 3 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Figure 4 for A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Viaarxiv icon

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

Add code
Nov 13, 2023
Viaarxiv icon

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models

Add code
Mar 29, 2023
Viaarxiv icon

CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

Add code
Sep 28, 2022
Figure 1 for CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Figure 2 for CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Figure 3 for CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Figure 4 for CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Viaarxiv icon

VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts

Add code
Dec 04, 2021
Figure 1 for VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Figure 2 for VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Figure 3 for VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Figure 4 for VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Viaarxiv icon