Picture for Fangxun Shu

Fangxun Shu

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

Add code
Mar 07, 2025
Figure 1 for CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Figure 2 for CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Figure 3 for CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Figure 4 for CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Viaarxiv icon

MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation

Add code
Mar 03, 2025
Viaarxiv icon

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts

Add code
Dec 05, 2024
Figure 1 for T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Figure 2 for T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Figure 3 for T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Figure 4 for T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Viaarxiv icon

SAG: Style-Aligned Article Generation via Model Collaboration

Add code
Oct 04, 2024
Figure 1 for SAG: Style-Aligned Article Generation via Model Collaboration
Figure 2 for SAG: Style-Aligned Article Generation via Model Collaboration
Figure 3 for SAG: Style-Aligned Article Generation via Model Collaboration
Figure 4 for SAG: Style-Aligned Article Generation via Model Collaboration
Viaarxiv icon

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Add code
Aug 28, 2024
Viaarxiv icon

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Add code
Jul 11, 2024
Figure 1 for MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Figure 2 for MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Figure 3 for MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Figure 4 for MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Viaarxiv icon

Autoregressive Pretraining with Mamba in Vision

Add code
Jun 11, 2024
Figure 1 for Autoregressive Pretraining with Mamba in Vision
Figure 2 for Autoregressive Pretraining with Mamba in Vision
Figure 3 for Autoregressive Pretraining with Mamba in Vision
Figure 4 for Autoregressive Pretraining with Mamba in Vision
Viaarxiv icon

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

Add code
Mar 20, 2024
Figure 1 for HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Figure 2 for HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Figure 3 for HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Figure 4 for HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Viaarxiv icon

Audio-Visual LLM for Video Understanding

Add code
Dec 13, 2023
Figure 1 for Audio-Visual LLM for Video Understanding
Figure 2 for Audio-Visual LLM for Video Understanding
Figure 3 for Audio-Visual LLM for Video Understanding
Figure 4 for Audio-Visual LLM for Video Understanding
Viaarxiv icon

Compress & Align: Curating Image-Text Data with Human Knowledge

Add code
Dec 13, 2023
Figure 1 for Compress & Align: Curating Image-Text Data with Human Knowledge
Figure 2 for Compress & Align: Curating Image-Text Data with Human Knowledge
Figure 3 for Compress & Align: Curating Image-Text Data with Human Knowledge
Figure 4 for Compress & Align: Curating Image-Text Data with Human Knowledge
Viaarxiv icon