Picture for Fangxun Shu

Fangxun Shu

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts

Add code
Dec 05, 2024
Figure 1 for T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Figure 2 for T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Figure 3 for T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Figure 4 for T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Viaarxiv icon

SAG: Style-Aligned Article Generation via Model Collaboration

Add code
Oct 04, 2024
Figure 1 for SAG: Style-Aligned Article Generation via Model Collaboration
Figure 2 for SAG: Style-Aligned Article Generation via Model Collaboration
Figure 3 for SAG: Style-Aligned Article Generation via Model Collaboration
Figure 4 for SAG: Style-Aligned Article Generation via Model Collaboration
Viaarxiv icon

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Add code
Aug 28, 2024
Viaarxiv icon

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Add code
Jul 11, 2024
Figure 1 for MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Figure 2 for MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Figure 3 for MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Figure 4 for MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Viaarxiv icon

Autoregressive Pretraining with Mamba in Vision

Add code
Jun 11, 2024
Figure 1 for Autoregressive Pretraining with Mamba in Vision
Figure 2 for Autoregressive Pretraining with Mamba in Vision
Figure 3 for Autoregressive Pretraining with Mamba in Vision
Figure 4 for Autoregressive Pretraining with Mamba in Vision
Viaarxiv icon

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

Add code
Mar 20, 2024
Figure 1 for HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Figure 2 for HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Figure 3 for HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Figure 4 for HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Viaarxiv icon

Audio-Visual LLM for Video Understanding

Add code
Dec 13, 2023
Viaarxiv icon

Compress & Align: Curating Image-Text Data with Human Knowledge

Add code
Dec 13, 2023
Figure 1 for Compress & Align: Curating Image-Text Data with Human Knowledge
Figure 2 for Compress & Align: Curating Image-Text Data with Human Knowledge
Figure 3 for Compress & Align: Curating Image-Text Data with Human Knowledge
Figure 4 for Compress & Align: Curating Image-Text Data with Human Knowledge
Viaarxiv icon

Masked Contrastive Pre-Training for Efficient Video-Text Retrieval

Add code
Dec 05, 2022
Viaarxiv icon